2 releases
0.1.1 | Jun 9, 2024 |
---|---|
0.1.0 | Jun 8, 2024 |
#1 in #gff
64 downloads per month
36KB
595 lines
tuni
The goal of tuni
is to unify transcripts across different samples.
Overview
Transcript assembly tools can generate arbitary transcript IDs, which may lead to the same transcript being labelled with a different ID across samples.
For example, given two samples sample_1.gtf
and sample_2.gtf
:
sample_1.gtf
chr1 test transcript 1 100 . + . transcript_id "A";
chr1 test exon 1 40 . + . transcript_id "A";
chr1 test exon 50 100 . + . transcript_id "A";
--snip--
sample_2.gtf
chr1 test transcript 1 100 . + . transcript_id "B";
chr1 test exon 1 40 . + . transcript_id "B";
chr1 test exon 50 100 . + . transcript_id "B";
--snip--
The transcript displayed above is identical between the two samples, however the provided transcript_id
is different for each sample, "A" vs "B".
tuni
generates a .tuni.gtf
/.tuni.gff
for each input .gtf
/.gff
. These output files will contain an additional attribute field tuni_id
which contains a unified ID that will be same for identical transcripts across different samples.
sample_1.tuni.gtf
chr1 test transcript 1 100 . + . transcript_id "A"; tuni_id "tuni_0";
chr1 test exon 1 40 . + . transcript_id "A"; tuni_id "tuni_0";
chr1 test exon 50 100 . + . transcript_id "A"; tuni_id "tuni_0";
--snip--
sample_2.tuni.gtf
chr1 test transcript 1 100 . + . transcript_id "B"; tuni_id "tuni_0";
chr1 test exon 1 40 . + . transcript_id "B"; tuni_id "tuni_0";
chr1 test exon 50 100 . + . transcript_id "B"; tuni_id "tuni_0";
--snip--
Installation
Binary
Download the latest binary for Linux or macOS (ARM) from releases.
Cargo
Install Rust then run:
cargo install tuni
Usage
Usage: tuni [OPTIONS] --gtf-gff-path <*.txt> --output-dir </output/dir/>
Options:
-g, --gtf-gff-path <*.txt> A text file containing GTF/GFF paths
-o, --output-dir </output/dir/> Directory where outputted GTF/GFFs will be stored
-v, --verbose Print log messages
-h, --help Print help
-V, --version Print version
Note: currently, only version 2 .gff
files are accepted by tuni
.
Dependencies
~1.3–2MB
~37K SLoC