#data-analysis #csv #sampling #data-science #random #rapid #performance

app rrrs

Welcome to RRRS, a rapid, hyper-optimized CSV random sampling tool designed with performance and efficiency at its core

4 releases

0.1.3 Mar 3, 2024
0.1.2 Mar 3, 2024
0.1.1 Mar 3, 2024
0.1.0 Mar 3, 2024

#2036 in Command line utilities

Download history 2/week @ 2024-09-26 3/week @ 2024-10-03

97 downloads per month

Custom license

140KB
154 lines

RRRS Logo

RRRS: Rust(ic) Rapid Random Sampler

Welcome to RRRS, a rapid, hyper-optimized CSV random sampling tool designed with performance and efficiency at its core. Crafted meticulously in Rust, RRRS offers an unparalleled solution for extracting random data samples from CSV files swiftly and effortlessly.

๐Ÿคจ Why RRRS

Born out of a frustrating, repetitive process of sampling from unwieldy or enormous CSV files during my time at Washington University in St. Louis, RRRS (Rust(ic) Rapid Random Sampler) represents more than just a tool; it's a perhaps slightly redundant, but fun mission to over-optimize and speed up the all-too-familiar frustration of data sampling. As a student navigating the complex waters of data-heavy courses, I found myself constantly bogged down by the inefficiency of existing methods of importing massive datasets into spreadsheet software, waiting for them to load, and then struggling with plugins or scripting to extract the samples I needed. It was clear: there had to be a better way. So, instead of doing my homework, I work on this:

Enter RRRS. Developed with the speed and efficiency of Rust, RRRS is my answer to those frustrating hours. It's designed to make random sampling from large CSV files not just faster, but a seamless part of your workflow. This tool is for anyone who's ever felt this nuisance, turning what was once a bottleneck into a smooth, efficient process. With RRRS, I'm excited to share a solution that helped me and is now here to support data enthusiasts and professionals alike in their analytical endeavors.

๐Ÿš€ Features

  • Rapid Random Sampling: Quickly extract random samples from large CSV files.
  • Hyper-Optimized Performance: Leveraging Rust's powerful system-level capabilities for maximum speed.
  • User-Friendly: Simple command-line interface to easily specify input and output.
  • Flexibility: Customizable random sampling according to your data analysis needs.
  • Cross-Platform Compatibility: Runs seamlessly on any platform supporting Rust.

๐Ÿ›  Usage

To get started with RRRS, follow these simple steps:

rrrs -i <input_file_path> -o <output_file_path>

Upon execution, RRRS will prompt you to enter the desired number of rows to be randomly sampled from your CSV file. The output will be a new CSV file with the original file title and a suffix indicating the number of sampled rows (e.g., slogan_data-100). This file will be saved in the execution path or a specified output directory.

๐Ÿ“‚ Directory Structure

Understand the organization of RRRS with the following directory structure:

rrrs/
โ”œโ”€โ”€ Cargo.toml              # Project manifest
โ”œโ”€โ”€ src/                    # Source files
โ”‚   โ”œโ”€โ”€ main.rs             # Entry point
โ”‚   โ”œโ”€โ”€ library.rs          # Library code
โ”‚   โ”œโ”€โ”€ args.rs             # Argument parsing
โ”‚   โ””โ”€โ”€ library/            # Library code
โ”‚       โ”œโ”€โ”€ sampler_ops/        # Sampling operations
โ”‚       โ”‚   โ”œโ”€โ”€ sampler_ops.rs      # Sampling logic
โ”‚       โ””โ”€โ”€ csv_ops/            # CSV operations
โ”‚           โ”œโ”€โ”€ csv_loader.rs   # CSV loading functionality
โ”‚           โ””โ”€โ”€ csv_writer.rs   # CSV writing functionality
โ””โ”€โ”€ tests/                  # Automated tests
    โ”œโ”€โ”€ args_tests.rs       # Tests for argument parsing
    โ”œโ”€โ”€ csv_loader_tests.rs # Tests for CSV loading
    โ”œโ”€โ”€ sampler_tests.rs    # Tests for sampling logic
    โ””โ”€โ”€ csv_writer_tests.rs # Tests for CSV writing

๐Ÿ“š Getting Started

MacOS and Linux

To use RRRS, you need to have Rust installed on your machine. If you don't have Rust installed, install it using the following command: curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh. For more information, refer to the official Rust installation guide here.

Once Rust is installed, you can install RRRS using the following command: cargo install rrrs.

Windows

Note: RRRS is not yet supported on Windows. However, you can still use it by installing the Windows Subsystem for Linux.

Building from Source

To build RRRS from source, you can clone the repository and build it using the following commands (Note that this is primarily for development purposes):

git clone git@github.com:ethan-wickstrom/rrrs.git
cd rrrs
cargo build --release
cp target/release/rrrs /usr/local/bin

๐Ÿค Contributing

Contributions to RRRS are warmly welcomed. Feel free to open an issue or submit a pull request, whether it's bug reports, feature requests, or code contributions. Please refer to the contributing guidelines for more details.

๐Ÿ“ License

RRRS is open-sourced under the Apache-2.0 license. See the LICENSE file for more details.

Dependencies

~43โ€“73MB
~1.5M SLoC