2 releases
0.1.6 | Oct 13, 2023 |
---|---|
0.1.5 | Jul 21, 2023 |
#8 in #gtf
60KB
294 lines
gtfjson
A simple CLI utility to convert a GTF file to NDJSON for fast parsing and perform other functionalities on those jsons.
Summary
The GTF file format is fantastic when working with bedtools
since it is essentially
a modified version of the BED
file format.
However, if you're interested in the annotations column, it can be a massive headache to parse - especially if you're operating on the full genome.
I wrote this tool to convert the GTF file format into streamable newline-delim JSON.
This makes it convenient to load with polars
in python incredibly fast and skip
all the annotation parsing.
Installation
You can install this with the rust package manager cargo
:
cargo install gtfjson
Usage
The executable of this tool is gj
.
Convert
To convert GTF file formats to NDJSON we can use the convert
subcommand
# classic i/o
gj convert -i <input.gtf> -o output.json
# write to stdout
gj convert -i <input.gtf>
Partition
We can also use gj
to partition a gtf-json in different ways.
It takes a variable in the attributes and creates a new file for each category of that record and populates those files with the records matching that category.
For example - we can write the GTF of every gene to a separate file:
# Partition on gene_name
gj partition -i <input.ndjson> -o partitions/ -v gene_name
# Partition of gene_id
gj partition -i <input.ndjson> -o partitions/ -v gene_id
# Partition of transcript_biotype
gj partition -i <input.ndjson> -o partitions/ -v transcript_biotype
Dependencies
~4–15MB
~130K SLoC