2 stable releases
1.0.1 | Apr 14, 2023 |
---|---|
1.0.0 | Apr 12, 2023 |
#1444 in Parser implementations
30KB
693 lines
crussmap: CrossMap in Rust
crussmap is a faster tool to convert genome coordinates between difference reference assemblies.
Support file formats: [BED,...].
This project reconstructs the CrossMap code by rust to effectively improve speed and performance
INSTALL
install cargo and rust here: https://www.rust-lang.org/tools/install
$ cargo install crussmap
USAGE
View
View chain files in tsv/csv format of block pair representation:
## view chain file in tsv format
> crussmap view --input data/test.chain --output out_file
## view chain file in csv format
> crussmap view --input data/test.chain --output out_file --csv
BED
Convert BED file from one assembly to another:
## convert with stdout
> crussmap bed --bed data/test.bed --input data/test.chain
## convert with file out
> crussmap bed --bed data/test.bed --input data/test.chain --output output_bed --unmap unmap_bed
TODO
Some popular bio-formats should be supported, but I don't have enough time to do it. If you are interested in this project, just contribute to it:)
benchmark
environment
: 1.4 GHz 4-core Intel Core i5;16 GB 2133 MHz DDR3;macOS 13.2 (22D49)
## resonable file size of .bed and .chain
> wc -l long.bed
10013 long.bed
> wc -l v2v3.chain
253064 v2v3.chain
> time release/crussmap bed -b long.bed -i v2v3.chain -o test.out -u test.unmap
________________________________________________________
Executed in 253.78 millis fish external
usr time 197.93 millis 0.16 millis 197.77 millis
sys time 51.45 millis 1.02 millis 50.43 millis
CORE IMPROVEMENT
chain file parser
Use nom to parse chain file, which is a fast and easy-to-use parser combinator library for Rust.
bed file serializer
Utilize csv and serde to deserialize bed file.
interval tree
A fast interval tree library: rust-lapper was used to build interval tree and query.
ROADMAP
- support gz file input
- convert maf/paf/sam/delta to chian and crussmap
LICENSE
Licensed under the MIT license.
Dependencies
~5MB
~81K SLoC