## lazysort

Lazy sorting for iterators

### 20 releases

Uses old Rust 2015

 0.2.1 Oct 16, 2018 Nov 8, 2017 Feb 3, 2016 Nov 22, 2015 Nov 30, 2014

#857 in Algorithms

Used in 12 crates (11 directly)

MIT/Apache

21KB
486 lines

# Lazysort

Adds a method to iterators that returns a sorted iterator over the data. The sorting is achieved lazily using a quicksort algorithm.

Available via crates.io.

## Usage

``````extern crate lazysort;

use lazysort::Sorted;

use lazysort::SortedBy;

use lazysort::SortedPartial;
``````

The `Sorted` trait adds a method `sorted` to all `Iterator<T: Ord>` which returns an iterator over the same data in default order.

The `SortedBy` trait adds a method `sorted_by` to all `Iterator<T>` which returns an iterator over the same data ordered according to the provided closure/function of type `Fn(&T, &T) -> Ordering`

The `SortedPartial` trait adds two methods `sorted_partial_first` and `sorted_partial_last` to all `Iterator<T: PartialOrd>` which returns an iterator over the same data in the default order. The difference between the two is whether non-comparable values go first or last in the results.

For example:

``````let data: Vec<uint> = vec![9, 1, 3, 4, 4, 2, 4];
for x in data.iter().sorted() {
println!("{}", x);
}
``````

Will print: 1, 2, 3, 4, 4, 4, 9

A more complex example. Sort strings by length, then in default string order:

``````let before: Vec<&str> = vec!["a", "cat", "sat", "on", "the", "mat"];
before.iter().sorted_by(|a, b| {
match a.len().cmp(&b.len()) {
Equal => a.cmp(b),
x => x
}
})
``````

This returns an iterator which yields: `a`, `on`, `cat`, `mat`, `sat`, `the`.

## Implementation details and performance

The algorithm is essentially the same as described in my blog post using a lazy sort as an example of Clojure's lazy sequences. But made to fit in with Rust's iterators.

The full sequence from the parent iterator is read, then each call to `next` returns the next value in the sorted sequence. The sort is done element-by-element so the full order is only realised by iterating all the way through to the end.

The algorithm is the quicksort, but depth-first; upon each call to `next` it does the work necessary to find the next item then pauses the state until the next call to `next`.

To test performance we compare it against sorting the full vector, using the `sort` function from the standard library, and also against `std::collections::BinaryHeap`.

First we compare what happens when sorting the entire vector:

``````test benches::c_heap_bench     ... bench:   3,703,166 ns/iter (+/- 454,189)
test benches::c_lazy_bench     ... bench:   3,961,047 ns/iter (+/- 603,083)
test benches::c_standard_bench ... bench:   3,093,873 ns/iter (+/- 430,401)
``````

There are differences between the three, and not surprisingly the built-in sort is fastest.

These benchmarks are for sorting 50,000 random `uint`s in the range 0 <= x < 1000000. Run `cargo bench` to run them.

So what's the point of lazy sorting? As per the linked blog post, they're useful when you do not need or intend to need every value; for example you may only need the first 1,000 ordered values from a larger set.

Comparing the lazy approach `data.iter().sorted().take(x)` vs a standard approach of sorting a vector then taking the first `x` values gives the following.

The first 1,000 out of 50,000:

``````test benches::a_heap_bench     ... bench:     366,767 ns/iter (+/- 55,393)
test benches::a_lazy_bench     ... bench:     171,923 ns/iter (+/- 52,784)
test benches::a_standard_bench ... bench:   3,055,734 ns/iter (+/- 348,407)
``````

The lazy approach is quite a bit faster; this is due to the 50,000 only being sorted enough to identify the first 1,000, the rest remain unsorted. `BinaryHeap` is also quite fast, for the same reason.

The first 10,000 out of 50,000:

``````test benches::b_heap_bench     ... bench:   1,126,774 ns/iter (+/- 156,833)
test benches::b_lazy_bench     ... bench:     993,954 ns/iter (+/- 208,188)
test benches::b_standard_bench ... bench:   3,054,598 ns/iter (+/- 285,970)
``````

The lazy approach is still faster in this situation.