#bioinformatics-sequence #random #sequence #nucleotide #bioinformatics

bin+lib nucgen

A simple tool and library for generating random nucleotide sequences

3 unstable releases

0.2.0 Apr 23, 2025
0.1.2 Jan 29, 2025
0.1.1 Dec 30, 2024

#388 in Biology

Download history 67/week @ 2025-01-13 22/week @ 2025-01-20 129/week @ 2025-01-27 19/week @ 2025-02-03 3/week @ 2025-02-10 40/week @ 2025-02-17 68/week @ 2025-02-24 153/week @ 2025-03-03 5/week @ 2025-03-10 36/week @ 2025-03-17 29/week @ 2025-03-24 11/week @ 2025-03-31 53/week @ 2025-04-07 23/week @ 2025-04-14 287/week @ 2025-04-21 21/week @ 2025-04-28

388 downloads per month
Used in 3 crates

MIT license

16KB
301 lines

nucgen

A fast and simple configurable nucleotide generator for testing bioinformatics tools with random fasta and fastq files.

All nucleotides {A,C,T,G} are generated randomly with equal probability.

Installation

cargo install nucgen

Usage (CLI)

All the options are configurable via the command line.

You can see the available options by running:

nucgen --help

Examples

Generate 10,000 reads of length 100bp in a FASTQ format and output to stdout:

nucgen -n 10000 -l 100 -fq

Generate a paired-end dataset of 100 reads with R1 length 30 and R2 length 50. Output as FASTA format and write to files in gzip format.

nucgen -n 100 -l 30 -L 50 -fa reads_R1.fasta.gz reads_R2.fasta.gz

Seed the random number generator with a specific value:

nucgen -n 100 -l 100 -fq -S 42

Usage (Library)

Add nucgen as a dependency in your Cargo.toml:

cargo add nucgen

You can use the Sequence struct to generate random nucleotide sequences:

use nucgen::{Sequence, write_fasta};

// Generate a cursor to write the output to
let mut out = Cursor::new(Vec::new());

// Initialize the random number generator
let mut rng = rand::thread_rng();

// Initialize the sequence struct
let mut seq = Sequence::new();

// Generate 100 random nucleotides into the sequence
seq.fill_buffer(&mut rng, 100);

// Write the sequence to the output cursor
write_fasta(&mut out, 0, seq.bytes())?;

// Generate another 100 random nucleotides
seq.fill_buffer(&mut rng, 100);

// Write the second sequence to the output cursor
write_fasta(&mut out, 1, seq.bytes())?;

Dependencies

~7MB
~84K SLoC