18 releases (5 breaking)
0.6.0 | Apr 22, 2024 |
---|---|
0.5.1 | Mar 8, 2024 |
0.4.0 | Mar 8, 2024 |
0.3.0 | Dec 7, 2023 |
0.1.9 | Apr 26, 2019 |
#343 in Data structures
Used in bytebraise
77KB
1.5K
SLoC
nested_intervals
This crate deals with interval sets which are lists of Ranges that may be both overlapping and nested.
The implementation is based on nested containment lists as proposed by Alekseyenko et al. 2007, which offers the same big-O complexity s interval trees (O(n * log(n)) construction, O(n + m) queries). The construction of the query data structure is lazy and only happens the first time a method relying on it is called.
Each interval has a vec of u32 ids attached, which allows linking back the results to other data structures.
Full documentation at docs.rs Source at GitHub
Example
Code example:
fn test_example() {
let intervals = vec![0..20, 15..30, 50..100];
let mut interval_set = IntervalSet::new(&intervals);
assert_eq!(interval_set.ids, vec![vec![0], vec![1], vec![2]]); // automatic ids, use new_with_ids otherwise
let hits = interval_set.query_overlapping(10..16);
assert_eq!(hits.intervals, [0..20, 15..30]);
let merged = hits.merge_hull();
assert_eq!(merged.intervals, [0..30]);
assert_eq!(merged.ids, vec![vec![0,1]]);
}
Functionality
-
check for overlap with a query range -> has_overlap
-
query for overlapping with a query range -> query_overlapping
-
query for overlapping with a query set -> filter_to_overlapping_multiple
-
query for non-overlapping with a query set -> filter_to_non_overlapping
-
check for internally overlapping -> any_overlapping
-
check for internally nested -> any_nested
-
remove duplicate intervals (start&stop!)-> remove_duplicates
-
remove duplicate intervals and complain about non-duplicate overlapping -> [remove_duplictes ](https://docs.rs/nested_intervals/struct.IntervalSet.html#method.remove_duplictes & any_overlapping
-
remove empty intervals -> remove_empty
-
merge internally overlapping by joining them -> merge_hull
-
merge internally overlapping by dropping them -> merge_drop
-
split internally overlapping into non-overlapping subsets (ie. [10..100, 20..30] becomes [10..20, 20..30, 30..100] -> merge_split
-
invert an interval set (given outer borders) -> invert
-
find the interval with the closest start -> find_closest_start
-
find the interval with the closest start to the left of a point -> find_closest_start_left
-
find the interval with the closest start to the right of a point -> find_closest_start_right
-
calculate the units covered by the intervals -> covered_units
-
find the mean size of the intervals -> mean_interval_size
-
build the union of n interval sets -> union
-
substract two interval sets -> filter_to_non_overlapping
-
keep only those overlapping with n other sets -> filter_to_overlapping_k_others
-
remove those overlapping with more than n other sets -> filter_to_non_overlapping_k_others
Not (yet) supported
We currently can not
-
find the interval with the closest end
-
find the interval with the closest end to the left of a point //going be expensive O(n/2)
-
find the interval with the closest end to the right of a point //going be expensiv O(n/2)
-
intersect two interval sects (ie. covered units in both sets)
-
intersect more than two interval sects (ie. covered units in multiple sets, possibly applying a 'k' threshold)
-
merge internally overlapping by intersecting them? What does than even mean for nested sets?
Changelog
- 0.6.0 - Fixed a bug with has_overlap, which would produce false negatives if intervals with identical starts were in the dataset.
Dependencies
~1MB
~18K SLoC