#sorting #external

extsort

External sorting (i.e. on disk sorting) capability on arbitrarily sized iterator

9 unstable releases

0.5.0 Mar 25, 2024
0.4.2 Feb 5, 2021
0.4.0 Dec 23, 2020
0.3.0 Feb 8, 2020
0.1.3 Dec 8, 2018

#316 in Algorithms

Download history 1984/week @ 2024-09-21 3142/week @ 2024-09-28 2579/week @ 2024-10-05 1465/week @ 2024-10-12 2704/week @ 2024-10-19 2094/week @ 2024-10-26 1818/week @ 2024-11-02 1945/week @ 2024-11-09 2548/week @ 2024-11-16 1865/week @ 2024-11-23 2442/week @ 2024-11-30 2000/week @ 2024-12-07 1793/week @ 2024-12-14 946/week @ 2024-12-21 1693/week @ 2024-12-28 2335/week @ 2025-01-04

7,070 downloads per month
Used in 7 crates (4 directly)

Apache-2.0

31KB
544 lines

extsort

crates.io dependency status

Exposes external sorting (i.e. on-disk sorting) capability on arbitrarily sized iterators, even if the generated content of the iterator doesn't fit in memory. Once sorted, it returns a new sorted iterator.

To remain efficient for all implementations, the crate doesn't handle serialization but leaves that to the user.

The sorter can optionally use rayon to sort the in-memory buffer. It is generally faster when the buffer size is big enough for parallelism to have an impact on its overhead.

Example

extern crate extsort;
extern crate byteorder;

use extsort::*;
use byteorder::{ReadBytesExt, WriteBytesExt};
use std::io::{Read, Write};

#[derive(Debug, Eq, PartialEq, Ord, PartialOrd)]
struct MyStruct(u32);

impl Sortable for MyStruct {
    fn encode<W: Write>(&self, write: &mut W) -> std::io::Result<()> {
        write.write_u32::<byteorder::LittleEndian>(self.0)?;
        Ok(())
    }

    fn decode<R: Read>(read: &mut R) -> std::io::Result<MyStruct> {
        read.read_u32::<byteorder::LittleEndian>().map(MyStruct)
    }
}

fn main() {
    let sorter = ExternalSorter::new();
    let reversed_data = (0..1000).rev().map(MyStruct).into_iter();
    let sorted_iter = sorter.sort(reversed_data).unwrap();
    let sorted_data = sorted_iter.collect::<std::io::Result<Vec<MyStruct>>>().unwrap();

    let expected_data = (0..1000).map(MyStruct).collect::<Vec<MyStruct>>();
    assert_eq!(sorted_data, expected_data);
}

Dependencies

~3–12MB
~151K SLoC