#sorting #algorithm #external #external-sort

bin+lib ext-sort

rust external sort algorithm implementation

5 releases

0.1.4 Apr 16, 2023
0.1.3 Feb 25, 2023
0.1.2 Jan 11, 2022
0.1.1 Jan 9, 2022
0.1.0 Jan 8, 2022

#204 in Algorithms

Download history 254/week @ 2023-12-11 352/week @ 2023-12-18 254/week @ 2023-12-25 332/week @ 2024-01-01 237/week @ 2024-01-08 244/week @ 2024-01-15 259/week @ 2024-01-22 246/week @ 2024-01-29 297/week @ 2024-02-05 179/week @ 2024-02-12 290/week @ 2024-02-19 359/week @ 2024-02-26 327/week @ 2024-03-04 348/week @ 2024-03-11 527/week @ 2024-03-18 873/week @ 2024-03-25

2,101 downloads per month
Used in 5 crates (4 directly)

Unlicense

47KB
993 lines

Crates.io License Test Status Documentation

Rust external sort

ext-sort is a rust external sort algorithm implementation.

External sorting is a class of sorting algorithms that can handle massive amounts of data. External sorting is required when the data being sorted do not fit into the main memory (RAM) of a computer and instead must be resided in slower external memory, usually a hard disk drive. Sorting is achieved in two passes. During the first pass it sorts chunks of data that each fit in RAM, during the second pass it merges the sorted chunks together. For more information see External Sorting.

Overview

ext-sort supports the following features:

  • Data agnostic: it supports all data types that implement serde serialization/deserialization by default, otherwise you can implement your own serialization/deserialization mechanism.
  • Serialization format agnostic: the library uses MessagePack serialization format by default, but it can be easily substituted by your custom one if MessagePack serialization/deserialization performance is not sufficient for your task.
  • Multithreading support: multi-threaded sorting is supported, which means data is sorted in multiple threads utilizing maximum CPU resources and reducing sorting time.
  • Memory limit support: memory limited sorting is supported. It allows you to limit sorting memory consumption (memory-limit feature required).

Basic example

Activate memory-limit feature of the ext-sort crate on Cargo.toml:

[dependencies]
ext-sort = { version = "^0.1.4", features = ["memory-limit"] }
use std::fs;
use std::io::{self, prelude::*};
use std::path;

use bytesize::MB;
use env_logger;
use log;

use ext_sort::{buffer::mem::MemoryLimitedBufferBuilder, ExternalSorter, ExternalSorterBuilder};

fn main() {
    env_logger::Builder::new().filter_level(log::LevelFilter::Debug).init();

    let input_reader = io::BufReader::new(fs::File::open("input.txt").unwrap());
    let mut output_writer = io::BufWriter::new(fs::File::create("output.txt").unwrap());

    let sorter: ExternalSorter<String, io::Error, MemoryLimitedBufferBuilder> = ExternalSorterBuilder::new()
        .with_tmp_dir(path::Path::new("./"))
        .with_buffer(MemoryLimitedBufferBuilder::new(50 * MB))
        .build()
        .unwrap();

    let sorted = sorter.sort(input_reader.lines()).unwrap();

    for item in sorted.map(Result::unwrap) {
        output_writer.write_all(format!("{}\n", item).as_bytes()).unwrap();
    }
    output_writer.flush().unwrap();
}

Dependencies

~4–16MB
~190K SLoC