#similarity #ontology #ontologies #calculations #semantic #terms #python

semsimian

Sematic similarity calculations for ontologies implemented in Rust

7 releases

0.2.15 Mar 14, 2024
0.2.14 Mar 13, 2024
0.2.12 Feb 6, 2024
0.2.11 Nov 29, 2023

#705 in Development tools

Download history 7/week @ 2024-02-04 11/week @ 2024-02-18 30/week @ 2024-02-25 150/week @ 2024-03-03 467/week @ 2024-03-10 47/week @ 2024-03-17 2/week @ 2024-03-24 29/week @ 2024-03-31 1/week @ 2024-04-07

100 downloads per month

BSD-3-Clause and GPL-3.0 licenses

28MB
6K SLoC

Rust 5K SLoC // 0.1% comments Jupyter Notebooks 681 SLoC // 0.1% comments Python 504 SLoC // 0.0% comments

semsimian

Semsimian is a package to provide fast semantic similarity calculations for ontologies. It is a Rust library with a Python interface.

This includes implementation of Jaccard and Resnik similarity of terms in an ontology, as well as a method to calculate the similarity of two sets of terms (so-called termset similarity). Other methods will be added in the future.

Semsimian is currently integrated into OAK and the Monarch app to provide fast semantic similarity calculations.

Rust Installation

  • cargo add semsimian

Python Installation

  • Set up your virtual environment of choice.
  • cd semsimian (home directory of this project)
  • pip install maturin
  • maturin develop
  • python
Python 3.9.16 (main, Jan 11 2023, 10:02:19) 
[Clang 14.0.6 ] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from semsimian import Semsimian
>>> s = Semsimian([('banana', 'is_a', 'fruit'), ('cherry', 'is_a', 'fruit')])
>>> s.jaccard_similarity('banana', 'cherry')

This should yield a value of 1.0.

Releases

As of version 0.2.11, the semsimian source is released on GitHub, with a corresponding set of Python wheels released to PyPi and a corresponding release in crates.io.

Dependencies

~36–70MB
~1M SLoC