#similarity #ontology #ontologies #calculations #semantic #terms #python

semsimian

Sematic similarity calculations for ontologies implemented in Rust

10 releases

new 0.2.17 Jul 23, 2024
0.2.17-rc1 Jul 15, 2024
0.2.16 May 7, 2024
0.2.15 Mar 14, 2024
0.2.11 Nov 29, 2023

#4 in #terms

Download history 29/week @ 2024-03-31 1/week @ 2024-04-07 111/week @ 2024-05-05 2/week @ 2024-05-12 6/week @ 2024-05-19 9/week @ 2024-06-30 109/week @ 2024-07-14

118 downloads per month

BSD-3-Clause and GPL-3.0 licenses

28MB
6.5K SLoC

Rust 5K SLoC // 0.1% comments Jupyter Notebooks 681 SLoC // 0.1% comments Python 530 SLoC // 0.0% comments

semsimian

Semsimian is a package to provide fast semantic similarity calculations for ontologies. It is a Rust library with a Python interface.

This includes implementation of Jaccard and Resnik similarity of terms in an ontology, as well as a method to calculate the similarity of two sets of terms (so-called termset similarity). Other methods will be added in the future.

Semsimian is currently integrated into OAK and the Monarch app to provide fast semantic similarity calculations.

Rust Installation

  • cargo add semsimian

Python Installation

  • Set up your virtual environment of choice.
  • cd semsimian (home directory of this project)
  • pip install maturin
  • maturin develop
  • python
Python 3.9.16 (main, Jan 11 2023, 10:02:19) 
[Clang 14.0.6 ] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from semsimian import Semsimian
>>> s = Semsimian([('banana', 'is_a', 'fruit'), ('cherry', 'is_a', 'fruit')])
>>> s.jaccard_similarity('banana', 'cherry')

This should yield a value of 1.0.

Releases

As of version 0.2.11, the semsimian source is released on GitHub, with a corresponding set of Python wheels released to PyPi and a corresponding release in crates.io.

Dependencies

~37–67MB
~1M SLoC