3 unstable releases

Uses new Rust 2024

new 0.2.0	Apr 26, 2025
0.1.4	Mar 29, 2025
0.1.3	Feb 16, 2025

#121 in Biology

129 downloads per month

MIT/Apache

62KB
778 lines

Crib

Crib reads multiple bigWig files concurrently, whether local or remote (via http or S3).

Genomic track files come in many varieties. One of the most common binary file formats for quantitative measurements across the genome is bigWig. Crib makes it possible to access a specific genomic region of multiple bigWig files concurrently and in bulk.

CLI

crib view reads a single genomic region from multiple files, and outputs the results as a bedGraph to stdout .

crib view <chrom>:<start>-<end> <files>

Examples

crib view 3:40000-60000 s3://bucket/path/file.bw
crib view 13:4000-245000 local/path/*.bw second.bw s3://bucket/path/another.bw
crib view X:4000-245000 s3://bucket/path/ > output.bg
crib view 5:16000-18000 https://example.org/bw/file.bw > output.bg

Library

The associated library's API is not stable. It is likely that it will change substantially.

Limitations

1. Currently, the library used for accessing BigWig files (bigtools) breaks up large queries into smaller queries (blocks) suitable for caching (10 KB). It then queries using attohttpc, which is sync. This means that:

Async cooperative reading of multiple files is not possible with bigtools. Therefore crib spawns an OS thread for every file accessed concurrently. This shouldn't be a problem on a conventional computer, but it could impact throughput in resource-constrained environments.
Querying large genomic regions means a large number of get requests will be made e.g. to your S3 instance. This might have cost or throttling implications.

2. Querying is limited to a genomic range from a single chromosome or contig e.g. 3:15000-50000.

License

^{Licensed under either of Apache License, Version 2.0 or MIT license at your option.}

Dependencies

~55MB
~670K SLoC