1 unstable release

Uses new Rust 2021

0.1.0 Nov 28, 2021

#24 in #sampling

MIT license

11KB
211 lines

samplr

samplr is a CLI tool to randomly sample data; generating a fixed size sample of input lines with uniform probabilities.

Installation

Source

Requires Rust to be installed.

git clone https://github.com/SteadBytes/sample.git
cd sample
cargo install --path .

Examples

Sample 15 lines from a file:

sample -n 15 things.txt

Sample 15 lines from standard input:

<things.txt | sample -n 15

Sample 15 lines from multiple files:

sample -n 15 things.txt other_things.txt

Sampling Algorithm

samplr uses a Reservoir Sampling algorithm to generate fixed size samples from an input stream of unknown length. For more details, see the implementation and the linked blog article.

Development

Tests

Run unit tests:

cargo test

Run all tests (including potentially CPU intensive statistical tests):

cargo test --all-features --release

Dependencies

~1.5MB
~22K SLoC