15 releases (7 breaking)
0.8.3 | Mar 18, 2024 |
---|---|
0.7.0 | Dec 1, 2023 |
0.6.1 | Oct 30, 2023 |
0.3.1 | Jul 14, 2023 |
#80 in Science
33 downloads per month
Used in align-cli
7.5MB
17K
SLoC
Match those fragments!
Handle mass spectrometry data in Rust. This crate is set up to handle very complex peptides with
loads of ambiguity and complexity. It pivots around the ComplexPeptide
and LinearPeptide
which encode the ProForma specification. Additionally
this crate enables the reading of mgf, doing spectrum annotation
(BU/MD/TD), finding isobaric sequences, doing alignments of peptides
, accessing the IMGT germline database, and reading identified peptide files.
Library features
- Read pro forma sequences ('level 2-ProForma + mass spectrum compliant + glycans compliant', with the intention to fully support the whole spec)
- Generate theoretical fragments with control over the fragmentation model from any supported pro forma peptide
- Generate fragments from satellite ions (w, d, and v)
- Generate glycan fragments
- Generate theoretical fragments for modifications of unknown position
- Generate theoretical fragments for chimeric spectra
- Read mgf files
- Match spectra to the generated fragments
- Extensive use of
uom
for compile time unit checking - Align peptides based on mass (algorithm will be tweaked extensively over time) (see
Stitch
for more information, but the algorithm has been improved)
Example usage
# fn main() -> Result<(), rustyms::error::CustomError> {
# let raw_file_path = "data/annotated_example.mgf";
// Open some data and see if the given peptide is a valid match
use rustyms::{*, system::{Charge, e}};
let peptide = ComplexPeptide::pro_forma("Q[Gln->pyro-Glu]VQEVSERTHGGNFD")?;
let spectrum = rawfile::mgf::open(raw_file_path)?;
let model = Model::ethcd();
let fragments = peptide.generate_theoretical_fragments(Charge::new::<e>(2.0), &model);
let annotated = spectrum[0].annotate(peptide, &fragments, &model, MassMode::Monoisotopic);
let fdr = annotated.fdr(&fragments, &model);
// This is the incorrect sequence for this spectrum so the FDR will indicate this
# dbg!(&fdr, fdr.sigma(), fdr.fdr(), fdr.score());
assert!(fdr.sigma() < 2.0);
# Ok(()) }
# fn main() -> Result<(), rustyms::error::CustomError> {
// Check how this peptide compares to a similar peptide (using `align`)
// (same sequence, repeated for easy reference)
use rustyms::{*, align::*};
let first_peptide = LinearPeptide::pro_forma("Q[Gln->pyro-Glu]VQEVS")?;
let second_peptide = LinearPeptide::pro_forma("E[Glu->pyro-Glu]VQVES")?;
let alignment = align::<4>(&first_peptide, &second_peptide,
matrix::BLOSUM62, Tolerance::new_ppm(10.0), AlignType::GLOBAL);
# dbg!(&alignment);
let stats = alignment.stats();
# //assert_eq!(stats.identical, 3); // Only three positions are identical
assert_eq!(stats.mass_similar, 6); // All positions are mass similar
# Ok(()) }
Compilation features
Rustyms ties together multiple smaller modules into one cohesive structure. It has multiple features which allow you to slim it down if needed (all are enabled by default).
identification
- gives access to methods reading many different identified peptide formats.align
- gives access to mass based alignment of peptides.imgt
- enables access to the IMGT database of antibodies germline sequences, with annotations.rayon
- enables parallel iterators using rayon, mostly forimgt
but also in consecutive align.
Dependencies
~8.5MB
~167K SLoC