8 releases

new 0.3.5 May 22, 2023
0.3.3 May 17, 2023
0.2.4 Feb 23, 2023

#71 in Biology

Download history 19/week @ 2023-02-16 47/week @ 2023-02-23 3/week @ 2023-03-23 3/week @ 2023-04-06 1/week @ 2023-04-13 3/week @ 2023-04-20 11/week @ 2023-04-27 49/week @ 2023-05-04 57/week @ 2023-05-11 64/week @ 2023-05-18

181 downloads per month

MIT license

210KB
6K SLoC

BlobTk (v0.2.4)

About

BlobTk contains a set of core functions used by BlobToolKit tools. Implemented in Rust, these functions are intended to be accessible from the command line, as python modules and will include web assembly code for use in javascript.

Installing

Command line tool

The command line tool is available as a linux/macos binary from the latest release page.

linux:

curl -Ls "https://github.com/blobtoolkit/core/releases/download/0.2.4/blobtk-linux" > blobtk &&
chmod 755 blobtk

macos:

curl -Ls "https://github.com/blobtoolkit/core/releases/download/0.2.4/blobtk-macos" > blobtk &&
chmod 755 blobtk

Python module

The python module can be installed using pip:

pip install blobtk

Usage

Command line tool

blobtk depth

blobtk depth --help
Calculate sequencing coverage depth

Usage: blobtk depth [OPTIONS]

Options:
  -i, --list <TXT>           Path to input file containing a list of sequence IDs
  -b, --bam <BAM>            Path to BAM file
  -c, --cram <CRAM>          Path to CRAM file
  -a, --fasta <FASTA>        Path to assembly FASTA input file (required for CRAM)
  -s, --bin-size <BIN_SIZE>  Bin size for coverage calculations (use 0 for full contig length) [default: 1000]
  -O, --bed <BED>            Output bed file name
  -h, --help                 Print help information
blobtk depth -b test/test.bam -O test/test.bed

blobtk depth -b test/test.bam -s 1000 -O test/test.1000.bed

blobtk filter

./blobtk filter --help
Filter files based on list of sequence names

Usage: blobtk filter [OPTIONS] <--bam <BAM>|--cram <CRAM>>

Options:
  -i, --list <TXT>       Path to input file containing a list of sequence IDs
  -b, --bam <BAM>        Path to BAM file
  -c, --cram <CRAM>      Path to CRAM file
  -a, --fasta <FASTA>    Path to assembly FASTA input file (required for CRAM)
  -f, --fastq <FASTQ>    Path to FASTQ file to filter (forward or single reads)
  -r, --fastq2 <FASTQ>   Path to paired FASTQ file to filter (reverse reads)
  -S, --suffix <SUFFIX>  Suffix to use for output filtered files [default: filtered]
  -A, --fasta-out        Flag to output a filtered FASTA file
  -F, --fastq-out        Flag to output filtered FASTQ files
  -O, --read-list <TXT>  Path to output list of read IDs
  -h, --help             Print help information
blobtk filter -i test/test.list -b test/test.bam -f test/reads_1.fq.gz -r test/reads_2.fq.gz -F

Python module

depth

from blobtk import depth

# generate bed file of coverage depths
depth.bam_to_bed(bam="test/test.bam", bed="test/pytest.bed")
depth.bam_to_bed(bam="test/test.bam", bin_size=1000, bed="test/pytest.1000.bed")

binned_covs = depth.bam_to_depth(bam="test/test.bam")
for cov in binned_covs:
    print({cov.seq_name: cov.bins[0]})


filter

from blobtk import filter

# filter fastq files based on a list of sequence names
read_count = filter.fastx(list_file="test/test.list", bam="test/test.bam", fastq1="test/reads_1.fq.gz", fastq2="test/reads_2.fq.gz", fastq_out=True)

print(read_count)

lib.rs:

blobtk is a set of core command line utilities and python bindings for processing common file formats used by BlobToolKit.

Dependencies

~24–56MB
~1M SLoC