### 4 releases (2 breaking)

0.3.1 | Jul 12, 2024 |
---|---|

0.3.0 | Jul 12, 2024 |

0.2.0 | Jul 9, 2024 |

0.1.0 | Jul 8, 2024 |

#**56** in Biology

**94** downloads per month

**MIT**license

13KB

153 lines

# fasta-stats

Compute simple descriptive statistics on a FASTA file

## Usage

`Simple descriptive statistics on ``FASTA` `(`biological sequence`)` data
Usage`:` fasta`-`stats `[``OPTIONS``]` `[``FILE``]`
Arguments`:`
`[``FILE``]`
Options`:`
`-`m`,` `-``-`median
`-`d`,` `-``-`stddev
`-`s`,` `-``-`sample `<``SAMPLE``>`
`-``-`hint `<``SIZE_HINT``>`
`-`h`,` `-``-`help Print help
`-`V`,` `-``-`version Print version

By default, this uses a streaming approach to compute mean, min, max, and count. Minimal memory should be required.

If the

or `median`

flags are present, more memory will be required as streaming isn't possible. In order to minimize memory usage, the `stddev`

argument can be specified; it is interpreted as "1 in n", as in, if `sample`

is provided, then an expected 1 in 100 samples will be stored in a vector for purposes of these calculations. Larger values of `--sample 100`

`sample`

will result in lower memory usage but less-accurate computations.This simple program expects to read FASTA data either on STDIN or from a named file, and will output the total number of sequences, as well as the min, max, mean, and optionally median and standard deviation, of the sequence lengths. If you have a compressed FASTA file, you can pipe it through

or `zcat`

to decompress it on the fly.`gunzip`

#### Dependencies

~20MB

~355K SLoC