#bioinformatics #motif #pssm #pvalue

lightmotif-tfmpvalue

Rust reimplementation of TFMPvalue for the lightmotif crate

10 releases (6 breaking)

0.9.1 Sep 2, 2024
0.9.0 Sep 2, 2024
0.8.0 Jun 28, 2024
0.7.3 Jun 17, 2024
0.3.0 Jun 25, 2023

#273 in Biology

Download history 144/week @ 2024-08-27 208/week @ 2024-09-03 24/week @ 2024-09-10 24/week @ 2024-09-17 36/week @ 2024-09-24 8/week @ 2024-10-01 4/week @ 2024-10-08 2/week @ 2024-10-15

700 downloads per month
Used in lightmotif-py

GPL-3.0-or-later

1.5MB
4.5K SLoC

🎼🧬 lightmotif-tfmpvalue Star me

A Rust port of the TFMPvalue algorithm for the lightmotif crate..

Actions Coverage License Crate Docs Source Mirror GitHub issues Changelog

🗺️ Overview

TFMPvalue is an algorithm proposed by Touzet & Varré[1] for computing a p-value from a score obtained with a position weight matrix. It uses discretization to compute an approximation of the score distribution for the position weight matrix, iterating with growing levels of accuracy until convergence is reached. This approach outperforms dynamic-programming based methods such as LazyDistrib by Beckstette et al.[2].

lightmotif-tfmpvalue provides an implementation of the TFMPvalue algorithm to use with position weight matrices from the lightmotif crate.

💡 Example

Use lightmotif to create a position specific scoring matrix, and then use the TFMPvalue algorithm to compute the exact P-value for a given score, or a score threshold for a given P-value:

extern crate lightmotif;
extern crate lightmotif_tfmpvalue;

use lightmotif::pwm::CountMatrix;
use lightmotif::abc::Dna;
use lightmotif::seq::EncodedSequence;
use lightmotif_tfmpvalue::TfmPvalue;

// Use a ScoringMatrix from `lightmotif`
let pssm = CountMatrix::<Dna>::from_sequences(&[
        EncodedSequence::encode("GTTGACCTTATCAAC").unwrap(),
        EncodedSequence::encode("GTTGATCCAGTCAAC").unwrap(),
    ])
    .unwrap()
    .to_freq(0.25)
    .to_scoring(None);

// Initialize the TFMPvalue algorithm for the given PSSM
// (the `pssm` reference must outlive `tfmp`).
let mut tfmp = TfmPvalue::new(&pssm);

// Compute the exact p-value for a given score
let pvalue = tfmp.pvalue(19.3);
assert_eq!(pvalue, 1.4901161193847656e-08);

// Compute the exact score for a given p-value
let score = tfmp.score(pvalue);
assert_eq!(score, 19.3);

Note that in the example above, the computation is not bounded, so for certain particular matrices the algorithm may require a large amount of memory to converge. Use the TfmPvalue::approximate_pvalue and TfmPvalue::approximate_score methods to obtain an iterator over the algorithm iterations, allowing you to stop at any given time based on external criterion such as total memory usage.

💭 Feedback

⚠️ Issue Tracker

Found a bug ? Have an enhancement request ? Head over to the GitHub issue tracker if you need to report or ask something. If you are filing in on a bug, please include as much information as you can about the issue, and try to recreate the same bug in a simple, easily reproducible situation.

📋 Changelog

This project adheres to Semantic Versioning and provides a changelog in the Keep a Changelog format.

⚖️ License

This library is provided under the open-source GNU General Public License v3.0. The original TFMPvalue implementation was written by the BONSAI team of CRISTaL, Université de Lille and is available under the terms of the GNU General Public License v2.0.

This project is in no way not affiliated, sponsored, or otherwise endorsed by the original TFMPvalue authors. It was developed by Martin Larralde during his PhD project at the European Molecular Biology Laboratory in the Zeller team.

📚 References

  • Touzet, Hélène and Jean-Stéphane Varré. ‘Efficient and accurate P-value computation for Position Weight Matrices’. Algorithms for Molecular Biology 2, 1–12 (2007). doi:10.1186/1748-7188-2-15.
  • Beckstette, Michael, Robert Homann, and Robert Giegerich. ‘Fast index based algorithms and software for matching position specific scoring matrices’. BMC Bioinformatics 7, 389 (2006). doi:10.1186/1471-2105-7-389.

Dependencies