#sorting #external #data-stream #merge #merge-sort

external_sort

Provides the ability to perform external sorts on structs, which allows for rapid sorting of large data streams

6 releases

0.1.2 Jul 3, 2019
0.1.1 Jun 25, 2018
0.0.3 Jun 19, 2018

#1297 in Algorithms

MIT license

14KB
200 lines

external_sort

Provides the ability to perform external sorts on structs, which allows for rapid sorting of extremely large data streams.

Usage

Add this to your Cargo.toml:

[dependencies]
external_sort = "^0.1.1"

and this to your crate root:

extern crate external_sort;

Examples

The following shows using external_sort to sort a vector of simple structs.

Note that your struct must impl Ord, Clone, as well as the serde Serialize and Deserialize traits. Additionally, in order for external_sort to track it's memory buffer usage, your struct must be able to report on it's size (via external_sort::ExternallySortable)

extern crate external_sort;
#[macro_use]
extern crate serde_derive;

use external_sort::{ExternalSorter, ExternallySortable};

#[derive(Serialize, Deserialize, Clone, PartialEq, Eq, PartialOrd, Ord)]
struct Num {
    the_num: u32,
}

impl Num {
    fn new(num: u32) -> Num {
        Num { the_num: num }
    }
}

impl ExternallySortable for Num {
    fn get_size(&self) -> u64 {
        4
    }
}

fn main() {
    let unsorted = vec![
        Num::new(5),
        Num::new(2),
        Num::new(1),
        Num::new(3),
        Num::new(4),
    ];
    let sorted = vec![
        Num::new(1),
        Num::new(2),
        Num::new(3),
        Num::new(4),
        Num::new(5),
    ];

    let external_sorter = ExternalSorter::new(16, None);
    let iter = external_sorter.sort(unsorted.into_iter()).unwrap();
    for (idx, i) in iter.enumerate() {
        assert_eq!(i.unwrap().the_num, sorted[idx].the_num);
    }
}

If your struct is unable to report on it's size, simply return 1 from get_size(), and then pass the number of objects (rather than bytes) that the ExternalSorter should keep in memory when calling ExternalSorter::new()

Dependencies

~1.1–2.2MB
~42K SLoC