#bioinformatics #ncrna #pairwise-alignment

bin+lib crast

CRAST, Context RNA Alignment Search Tool

5 stable releases

Uses old Rust 2015

1.0.4 Jun 20, 2017
1.0.3 Jun 5, 2017
1.0.2 May 19, 2017
1.0.1 Apr 29, 2017
1.0.0 Apr 15, 2017

#36 in #bioinformatics

MIT license

6MB
2K SLoC

Rust 1.5K SLoC Python 453 SLoC // 0.0% comments

CRAST, Context RNA Alignment Search Tool

This binary provides the CRAST algorithm, a BLAST-like RNA alignment search one. You can check all the available options by adding the option "-h".
A performance comparison of CRAST with other BLAST-like tools using all 18,185 house mouse ncRNAs/34 human lncRNAs known as homologs to house mouse corresponding ones as target/query sequences is as follows:

Tool/term TPs/FPs/TNs/FNs F-meas. DB[s] Align.[s]
CRAST 65/107/0/0 0.548 160.0[m] 148.0 (34.60)
LAST 63/365/0/0 0.256 7.246 0.195
BLASTN 63/623/20/0 0.168 1.646 1.007

The "align. time" of CRAST inside the parentheses is time except for the pre-processing. (The "bzip2" decompression is time-consuming.)
The "TP" is map of any human one to any corresponding house mouse one; the "FP" is any human one to any of the others.
As a negative dataset, we made all the query ones di-nucleotide shuffled with UShuffle. The "TN" is map of any shuffled query one to any of others than corresponding target ones; the "FN" is any shuffled query one to any of corresponding target ones.
ROC curves of CRAST, LAST, BLASTN
The curves are derived from the above comparison using their alignment expectations as the thresholds.
Values inside the parentheses are areas under the curves. (The larger the area becomes, the better the prediction performance gets.)

Dependencies

The dependency in this project is the "bzip2" program (for database file compression). You can install it in case of Ubuntu as follows:

$ sudo apt-get install bzip2

Installation

This project has been written in Rust, a systems programming language. So first you need to install the Rust compiler (Rustc), Rust package manager (Cargo) and Rust standard library. Please visit the Rust website to see more about it. You can install them with 1 line as follows:

$ curl https://sh.rustup.rs -sSf | sh

The above installation is done by Rustup, so you can easily switch the compiler to use. Now you can install CRAST as follows:

$ cargo install --git https://github.com/heartsh/crast

Check if it has been installed properly as follows:

$ crast-db && crast

If you're interested in how much fast it is, run the benchmark as follows:

$ git clone https://github.com/heartsh/crast && cd crast
$ tar xvf asts.tar.bz2
$ cargo test --release -- --nocapture

Documents

The CRAST document in English.
The CRAST one in Japanese.
The CRAST thesis in BioRxiv.

Trace of Experiments on Docker

If you're interested in demonstrated experiments in the thesis, you can trace them on a container deployed the Docker image "heartsh/crast" available on Docker Hub into. (You can consider Docker Hub as Github for Docker.)
In advance, you need to enable GUI on Docker. The process differs among Mac, Windows and Linux. So we only explain about it on Mac, however, you could do it on the other platforms. First install and start XQuartz and Socat as follows:

$ brew cask install xquartz
$ brew install socat
$ open -a XQuartz
$ socat TCP-LISTEN:6000,reuseaddr,fork UNIX-CLIENT:\"$DISPLAY\" &

Next, you need to know your PC's IP address as follows:

$ ifconfig | grep inet # Line starting with "inet" will have IP address

Now, you're ready for use of GUI on Docker.
After Docker installation, first run CRAST in the container as follows:

$ docker pull heartsh/crast # Pull image from Docker Hub
$ docker run -it -e DISPLAY=$IP_ADDRESS:0 heartsh/crast zsh # Deploy image into container and enter into it with login shell where variable "IP_ADDRESS" is your PC's IP address
$ git pull && tar xvf asts.tar.bz2 && cargo test --release -- --nocapture && cp ~/.cargo/bin/crast* /usr/local/bin # Run CRAST for test set

Then run the other BLAST-like tools and Foldalign following appendices in the thesis. Note you need to prepend "##maf\n" into LAST outputs because without the file format identifier (it doesn't emit), the following process will fail. In this point, you're ready for getting statistics in the thesis. Try as follows:

$ ./gt_cmprsn_wth_blast_lk_tls.py
$ ./gt_cmprsn_wth_fldlgn.py
$ ./gt_crast_stts.py && okular asts/imgs/*.eps

Author

Heartsh

License

Copyright (c) 2016 Heartsh
Licensed under the MIT license.

Dependencies

~7MB
~116K SLoC