#spectra #fragmentation #bio #compile-time #proforma #massspectrometry

rustyms

A library to handle proteomic mass spectrometry data and match peptides to spectra

15 releases (7 breaking)

0.8.3 Mar 18, 2024
0.7.0 Dec 1, 2023
0.6.1 Oct 30, 2023
0.3.1 Jul 14, 2023

#80 in Science

33 downloads per month
Used in align-cli

MIT/Apache

7.5MB
17K SLoC

Match those fragments!

Handle mass spectrometry data in Rust. This crate is set up to handle very complex peptides with loads of ambiguity and complexity. It pivots around the ComplexPeptide and LinearPeptide which encode the ProForma specification. Additionally this crate enables the reading of mgf, doing spectrum annotation (BU/MD/TD), finding isobaric sequences, doing alignments of peptides , accessing the IMGT germline database, and reading identified peptide files.

Library features

  • Read pro forma sequences ('level 2-ProForma + mass spectrum compliant + glycans compliant', with the intention to fully support the whole spec)
  • Generate theoretical fragments with control over the fragmentation model from any supported pro forma peptide
    • Generate fragments from satellite ions (w, d, and v)
    • Generate glycan fragments
    • Generate theoretical fragments for modifications of unknown position
    • Generate theoretical fragments for chimeric spectra
  • Read mgf files
  • Match spectra to the generated fragments
  • Extensive use of uom for compile time unit checking
  • Align peptides based on mass (algorithm will be tweaked extensively over time) (see Stitch for more information, but the algorithm has been improved)

Example usage

# fn main() -> Result<(), rustyms::error::CustomError> {
# let raw_file_path = "data/annotated_example.mgf";
// Open some data and see if the given peptide is a valid match
use rustyms::{*, system::{Charge, e}};
let peptide = ComplexPeptide::pro_forma("Q[Gln->pyro-Glu]VQEVSERTHGGNFD")?;
let spectrum = rawfile::mgf::open(raw_file_path)?;
let model = Model::ethcd();
let fragments = peptide.generate_theoretical_fragments(Charge::new::<e>(2.0), &model);
let annotated = spectrum[0].annotate(peptide, &fragments, &model, MassMode::Monoisotopic);
let fdr = annotated.fdr(&fragments, &model);
// This is the incorrect sequence for this spectrum so the FDR will indicate this
# dbg!(&fdr, fdr.sigma(), fdr.fdr(), fdr.score());
assert!(fdr.sigma() < 2.0);
# Ok(()) }
# fn main() -> Result<(), rustyms::error::CustomError> {
// Check how this peptide compares to a similar peptide (using `align`)
// (same sequence, repeated for easy reference)
use rustyms::{*, align::*};
let first_peptide = LinearPeptide::pro_forma("Q[Gln->pyro-Glu]VQEVS")?;
let second_peptide = LinearPeptide::pro_forma("E[Glu->pyro-Glu]VQVES")?;
let alignment = align::<4>(&first_peptide, &second_peptide,
                 matrix::BLOSUM62, Tolerance::new_ppm(10.0), AlignType::GLOBAL);
# dbg!(&alignment);
let stats = alignment.stats();
# //assert_eq!(stats.identical, 3); // Only three positions are identical
assert_eq!(stats.mass_similar, 6); // All positions are mass similar
# Ok(()) }

Compilation features

Rustyms ties together multiple smaller modules into one cohesive structure. It has multiple features which allow you to slim it down if needed (all are enabled by default).

  • identification - gives access to methods reading many different identified peptide formats.
  • align - gives access to mass based alignment of peptides.
  • imgt - enables access to the IMGT database of antibodies germline sequences, with annotations.
  • rayon - enables parallel iterators using rayon, mostly for imgt but also in consecutive align.

Dependencies

~8.5MB
~167K SLoC