14 releases
0.1.14 | Sep 19, 2021 |
---|---|
0.1.13 | Sep 19, 2021 |
#1159 in Data structures
14KB
196 lines
A Disjoint-Set data structure (aka Union-Find w/ Rank)
What is Union-Find?
Suppose you have a collection S
of elements e1
, e2
, ...
, en
, and wish to group them into different collections using operations:
- "put
ei
andej
into the same group" (union), - "give me a representative of the group
ei
belongs to" (find).
Then a Union-Find data structure helps to store the underlying groups very efficiently and implements this API.
Note: The variant implemented uses Path Compression to further improve the performance.
(Some) Applications
-
Detect Cycles in Graph: Given a graph
G
, we can put the endpoints of edges into the same group (same connected component) unless there is a pair of endpoints(ei, ej)
that share a group representative. If that happens, there was already a path existing between them, and adding this edge will add multiple paths, which cannot be the case for acyclic graphs. -
Number of connected components in Graph: Given a graph
G
, put the endpoints of edges into the same group (same connected component). Once all nodes are exhausted, the number of groups formed is the number of connected components inG
.
Some interesting lecture notes regarding Union-Find.
Usage
Setup
In Cargo.toml
, add this crate as a dependency.
[dependencies]
reunion = { version = "0.1" }
API
Example 1
Task: Create a UnionFind data structure of arbitrary size that contains usize
at its elements.
Then, union a few elements and capture the state of the data structure after that.
Solution:
use reunion::{UnionFind, UnionFindTrait};
use std::collections::HashSet;
fn main() {
// Create a UnionFind data structure of arbitrary size that contains subsets of usizes.
let mut uf1 = UnionFind::<usize>::new();
println!("Initial state: {}", &uf);
println!("All elements form their own group (singletons).");
println!(format!("{:?}", uf.subsets());
uf.union(2, 1);
println!("After combining the groups that contains 2 and 1: {}", &uf);
uf.union(4, 3);
println!("After combining the groups that contains 4 and 3: {}", &uf);
uf.union(6, 5);
println!("After combining the groups that contains 6 and 5: {}", &uf);
let mut hs1 = HashSet::new();
hs1.insert(1);
hs1.insert(2);
let mut hs2 = HashSet::new();
hs2.insert(3);
hs2.insert(4);
let mut hs3 = HashSet::new();
hs3.insert(5);
hs3.insert(6);
let mut subsets = uf.subsets();
assert_eq!(subsets.len(), 3);
assert!(&subsets.contains(&hs1));
assert!(&subsets.contains(&hs2));
assert!(&subsets.contains(&hs3));
uf.union(1, 5);
println!("After combining the groups that contains 1 and 5: {}", &uf);
subsets = uf.subsets();
assert_eq!(subsets.len(), 2);
hs3.extend(&hs1);
assert!(&subsets.contains(&hs3));
assert!(&subsets.contains(&hs2));
let mut uf_clone = uf.clone();
uf_clone.find(2);
assert_eq!(&uf, &uf_clone);
println!("{}", &uf);
// It is possible to iterate over the subsets.
for partition in uf1 {
println!("{:?}", partition);
}
}
Example 2
Task: Create a UnionFind data structure of size at least 10
, that contains u16
at its elements.
Note: The size business only helps for reducing the number of memory reallocations required. Therefore, it is not too special and is totally optional.
Solution:
// Create a UnionFind data structure of a fixed size that contains subsets of u16.
let mut uf2 = UnionFind::<u16>::with_capacity(10);
println!("{}", uf2);
Performance
Benchmark
DIY
To benchmark on your machine:
- Clone this repository.
- Run
cargo bench
You should see some output like this:
#Find: 2497150, #Union: 1048575, #Total: 3545725, Time: 1.013497126s, Time per operation: 285ns
#Find: 2497150, #Union: 1048575, #Total: 3545725, Time: 1.013323348s, Time per operation: 285ns
#Find: 2497150, #Union: 1048575, #Total: 3545725, Time: 1.012333206s, Time per operation: 285ns
...
Big Merge (20, 10000) time: [1.0175 s 1.0190 s 1.0205 s]
change: [-0.4773% -0.2721% -0.0647%] (p = 0.01 < 0.05)
Change within noise threshold.
Found 13 outliers among 100 measurements (13.00%)
10 (10.00%) high mild
3 (3.00%) high severe
...
Summary
On a AMD Ryzen 9 3900X 12-Core Processor (with lots of other processes running),
working with a UnionFind of size 2 ** 20
, a total of 3,545,725
operations take roughly 1
second, which is expected because the time complexity
for these operations is effectively O(1)
(in truth it is O(alpha(n))
where alpha(n)
is the inverse Ackermann function but it grows so slow that we can hand wave it asa constant).
Dependencies
~0.3–1MB
~21K SLoC