#similarity #ontology #semantic #calculations #ontologies #terms #jaccard

semsimian

Sematic similarity calculations for ontologies implemented in Rust

12 releases

0.2.19 Aug 26, 2024
0.2.18 Aug 2, 2024
0.2.17 Jul 23, 2024
0.2.16 May 7, 2024
0.2.11 Nov 29, 2023

#404 in Text processing

Download history 42/week @ 2024-09-16 14/week @ 2024-09-23 35/week @ 2024-09-30 2/week @ 2024-11-18 166/week @ 2024-12-02 189/week @ 2024-12-09 17/week @ 2024-12-16

373 downloads per month

BSD-3-Clause and GPL-3.0 licenses

28MB
6.5K SLoC

Rust 5K SLoC // 0.1% comments Jupyter Notebooks 681 SLoC // 0.1% comments Python 550 SLoC // 0.0% comments

semsimian

Semsimian is a package to provide fast semantic similarity calculations for ontologies. It is a Rust library with a Python interface.

This includes implementation of Jaccard and Resnik similarity of terms in an ontology, as well as a method to calculate the similarity of two sets of terms (so-called termset similarity). Other methods will be added in the future.

Semsimian is currently integrated into OAK and the Monarch app to provide fast semantic similarity calculations.

Rust Installation

  • cargo add semsimian

Python Installation

  • Set up your virtual environment of choice.
  • cd semsimian (home directory of this project)
  • pip install maturin
  • maturin develop
  • python
Python 3.9.16 (main, Jan 11 2023, 10:02:19) 
[Clang 14.0.6 ] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from semsimian import Semsimian
>>> s = Semsimian([('banana', 'is_a', 'fruit'), ('cherry', 'is_a', 'fruit')])
>>> s.jaccard_similarity('banana', 'cherry')

This should yield a value of 1.0.

Releases

As of version 0.2.11, the semsimian source is released on GitHub, with a corresponding set of Python wheels released to PyPi and a corresponding release in crates.io.

Dependencies

~38–68MB
~1M SLoC