5 releases (3 breaking)
0.85.0 | Oct 18, 2023 |
---|---|
0.76.0 | Feb 23, 2023 |
0.74.2 | Jan 24, 2023 |
0.74.1 | Jan 23, 2023 |
0.70.0 | Oct 25, 2022 |
#2780 in Parser implementations
370KB
1.5K
SLoC
Nushell bio
A bioinformatics plugin for nushell. This plugin parses most common bioinformatics formats into structured data so you can use them with nushell more effectively.
Quick setup
Go and get nushell, it's great. I'm assuming you have the rust toolchain installed. Then come back!
# clone this repo
git clone https://github.com/Euphrasiologist/nu_plugin_bio
# change into the repo directory
cd nu_plugin_bio
# build
# it's quite a long compile time...
cargo build --release
# register the plugin
register nu_plugin_bio/target/release/nu_plugin_bio
# see the current file formats currently supported below
# now you can just use open, and the file extension will be auto-detected.
# there are some test files in the tests/ dir.
open ./tests/test.fasta
| get id
# if you want to add flags you have to explicitly use from <x>
# e.g. if you want descriptions in fasta files to be parsed.
open --raw ./tests/test.fasta
| from fasta -d
| first
The backend is a noodles
wrapper, an excellent, all-Rust bioinformatics I/O library.
Aims
Aim to support the following:
- BAM 1.6
- BCF 2.2
- bcf.gz
- VCF 4.3
- vcf.gz
- BED(3 only right now)
- CRAM 3.0
- FASTA
- fa.gz
- FASTQ
- fq.gz
- GFF3
- GTF 2.2
- SAM 1.6
- GFA 1.0
- gfa.gz
Note that performance will not be optimal with the current state of nu_plugin
, as we cannot access the engine state of nushell, and therefore need to load entire data structures into memory. Testing still needs to be done on large files.
More?
If there's a bioinformatics format you want to add, let me know, or add a PR.
Dependencies
~25–42MB
~602K SLoC