#bam #bfx

bio-io

My utilities for reading and writing bioinformatics file formats

5 unstable releases

0.3.2 Oct 9, 2023
0.3.1 Oct 8, 2023
0.3.0 Aug 13, 2023
0.2.0 Jul 29, 2023
0.1.1 Jul 21, 2023

#5 in #bam

Download history 45/week @ 2023-07-26 5/week @ 2023-08-02 36/week @ 2023-08-09 17/week @ 2023-08-16 6/week @ 2023-08-23 7/week @ 2023-08-30 18/week @ 2023-09-06 4/week @ 2023-09-13 13/week @ 2023-09-20 20/week @ 2023-09-27 80/week @ 2023-10-04 13/week @ 2023-10-11 21/week @ 2023-10-18 14/week @ 2023-10-25 4/week @ 2023-11-01 12/week @ 2023-11-08

53 downloads per month
Used in fibertools-rs

MIT license

42KB
292 lines



fibertools-rs

fibertools-rs dark logo fibertools-rs light logo

Actions Status Conda (channel only) Downloads crates.io version crates.io downloads DOI

fibertools-rs a CLI tool for creating and interacting with fiberseq bam files.

Install Conda (channel only)

fibertools-rs is avalible through bioconda and can be installed with the following command:

mamba install -c conda-forge -c bioconda fibertools-rs

However, the bioconda version currently does not support GPU acceleration. If you would like to use GPU acceleration, you will need to install using the directions in the INSTALL.md file.

Usage

ft --help

Help page for fibertools

Subcommands for fibertools-rs

ft predict-m6a

Predict m6A positions using HiFi kinetics data and encode the results in the MM and ML bam tags. Help page for predict-m6a.

ft add-nucleosomes

Add nucleosomes to a bam that file already contains m6a predictions. Note, this process is also run in the background during predict-m6a, so it is unnecessary to run independently unless you want to try new parameters for nucleosome calling. Help page for add-nucleosomes.

ft extract

Extracts Fiber-seq data from a bam file into plain text. Help page for extract. Extract

ft center

Center Fiber-seq reads (bam) around reference position(s). Help page for center. Center

Python API (pyft)

The python API is still in development and not stable; however, you can find the current code progress in the py-ft folder. More information available at readthedocs.

Cite

Jha, A., Bohaczuk, S. C., Mao, Y., Ranchalis, J., Mallory, B. J., Min, A. T., Hamm, M. O., Swanson, E., Finkbeiner, C., Li, T., Whittington, D., Stergachis, A. B., & Vollger, M. R. (2023). Fibertools: fast and accurate DNA-m6A calling using single-molecule long-read sequencing. bioRxiv. https://doi.org/10.1101/2023.04.20.537673

Read the fibertools library docs

You can find the docs for the latest release here: https://docs.rs/fibertools-rs/latest/fibertools_rs/ or download from source and run:

cargo doc --open

and the docs will open in your browser.

TODO items

  • Use new iterator for ft extract and group writes to try and improve the speed
  • long format extract command
  • Improve progress bar for predict-m6a.
    • Get size of bam, say how far we are through the bam in terms of MB/GB?
  • Add a python API (see py-ft for progress)
    • extract api
    • center api
    • improve docs
    • add default data viz
    • add conversion to pandas data frame or maybe anndata
  • GPU support
    • see if I can simplify or statically link PyTorch to get it onto bioconda
    • Detect GPU memory to set batch size dynamically.
  • Add unaligned, secondary, supplemental reads to the test bam.
  • add option result to bamlift
  • improve speed of liftover closest in bamlift. It takes about 50% of the time.
  • Add more test cases, learn about test modules in folders
  • Set filters for ML depending on the model used

Contributing

If you would like to contribute to fibertools-rs, please see the CONTRIBUTING.md file for more information.

Dependencies

~19–30MB
~466K SLoC