#vcf #batch #batching #distributed-computing #command-line-tool #variant-call-file

bin+lib vcf_batcher

Command-line tool to cut VCF (variant call files) into smaller batches, intended to be used for multiprocessing or distributed computing

3 unstable releases

0.2.1 May 15, 2023
0.2.0 May 9, 2023
0.1.0 May 4, 2023

#10 in #batching

MIT license

365KB
277 lines

📑️ VCF Batcher

Rust

This is a Rust crate to cut VCF (variant call files) into smaller batches, intended to be used for multiprocessing or distributed computing.

🧰️ Installation

Depending on what your goals are, you can use this tool as a CLI or as a library in 🦀️ Rust or 🐍️ Python.

Installing the CLI

In order to install the program as a CLI, you will need to have cargo installed. Instructions to install cargo

Once you have it, you can run the following command in your terminal to install the VCF batcher.

cargo install vcf_batcher

Installing Rust Crate

In order to install the tool as a rust crate, you can add it to your Cargo.toml dependencies or run:

cargo add vcf_batcher

You can find the crate documentation on docs.rs.

Installing python bindings

We provide python bindings for the VCF batcher which can be installed via pip.

pip install vcf-batcher

🪄️ Usage

CLI

Using the CLI after installing can be done through the vcf_batcher_cli command.

vcf_batcher_cli path/to/your_file.vcf path/to/ouput/directory

By default, this will create batches with 25'000 samples each. If you'd like to override this default, you can do so by providing a custom --batch-size or -b argument:

vcf_batcher_cli -b 1000 path/to/your_file.vcf path/to/ouput/directory

Library

After installing either the rust crate or python module, you can use the provided function.

🦀️ Rust

pub fn extract_variants_to_batches(
    file_path: &str,
    batch_size: usize,
    output_path: &Path,
    compression_level: Option<Compression>
)

🐍️ Python

vcf_batcher.py_extract_variants_to_batches(
        input_file,
        batches_folder,
        batch_size,
)

License

The software is licensed under the MIT License.

Dependencies

~8–16MB
~207K SLoC