#file-format #genome #bed #convert-file #coordinates #reference #bioinformatics

bin+lib crussmap

crussmap is a faster tool to convert genome coordinates between difference reference assemblies. Support file formats: [BED,…]. This project reconstructs the CrossMap code by rust to effectively improve speed and performance

2 stable releases

1.0.1 Apr 14, 2023
1.0.0 Apr 12, 2023

#1267 in Parser implementations

24 downloads per month

MIT license

30KB
693 lines

Crates.io Crates.io (latest) GitHub Workflow Status Crates.io GitHub repo size

crussmap: CrossMap in Rust

crussmap is a faster tool to convert genome coordinates between difference reference assemblies.

Support file formats: [BED,...].

This project reconstructs the CrossMap code by rust to effectively improve speed and performance

INSTALL

install cargo and rust here: https://www.rust-lang.org/tools/install

$ cargo install crussmap

USAGE

View

View chain files in tsv/csv format of block pair representation:

## view chain file in tsv format
> crussmap view --input data/test.chain --output out_file

## view chain file in csv format
> crussmap view --input data/test.chain --output out_file --csv

BED

Convert BED file from one assembly to another:

## convert with stdout
> crussmap bed --bed data/test.bed --input data/test.chain

## convert with file out
> crussmap bed --bed data/test.bed --input data/test.chain --output output_bed --unmap unmap_bed

TODO

Some popular bio-formats should be supported, but I don't have enough time to do it. If you are interested in this project, just contribute to it:)

benchmark

environment: 1.4 GHz 4-core Intel Core i5;16 GB 2133 MHz DDR3;macOS 13.2 (22D49)

## resonable file size of .bed and .chain
> wc -l long.bed
10013 long.bed
> wc -l v2v3.chain
253064 v2v3.chain
> time release/crussmap bed -b long.bed -i v2v3.chain -o test.out -u test.unmap

________________________________________________________
Executed in  253.78 millis    fish           external
   usr time  197.93 millis    0.16 millis  197.77 millis
   sys time   51.45 millis    1.02 millis   50.43 millis

CORE IMPROVEMENT

chain file parser

Use nom to parse chain file, which is a fast and easy-to-use parser combinator library for Rust.

bed file serializer

Utilize csv and serde to deserialize bed file.

interval tree

A fast interval tree library: rust-lapper was used to build interval tree and query.

ROADMAP

  • support gz file input
  • convert maf/paf/sam/delta to chian and crussmap

LICENSE

Licensed under the MIT license.

Dependencies

~5MB
~84K SLoC