Uses old Rust 2015
0.1.2 |
|
---|---|
0.1.1 |
|
0.1.0 |
|
#26 in #search-file
29KB
313 lines
rust-catalog
A "file-backed" map, which inserts keys and values into a file in O(n) time, and gets the values in O(log-n) time using binary search and file seeking. For now, it only supports insertion and getting of (hashable) keys and values that implement the Display
and FromStr
traits (i.e., those which can be converted to string and parsed back from string).
See the module documentation for more information.
Usage
Note that this was an experiment, and so use it at your own risk!
Add the following to your Cargo.toml
...
catalog = "0.1.2"
Have a look at the detailed example for the precise usage.
lib.rs
:
The HashMap
and BTreeMap
in the standard library
offer very good performance when it comes to inserting and getting stuff,
but they're memory killers. If the "stuff" gets large - say, a trillion
(1012) of them, then we're gonna be in trouble, as we'll then
be needing gigs of RAM to hold the data.
Moreover, once the program quits, all the hard-earned stuff gets deallocated,
and we'd have to re-insert them allover again. HashFile
deals
with this specific problem. It makes use of a BTreeMap
for storing the keys
and values. So, until it reaches the defined capacity, it offers the same
performance as that of the btree-map. However, once (and whenever) it reaches
the capacity, it flushes the stuff to a file (the necessary parameters can be
defined in its methods).
Hence, at any given moment, the upper limit for the memory eaten by this thing is set by its capacity. This gives us good control over the space-time trade-off. But, the flushing will take O(2n) time, depending on the processor and I/O speed, as it does things on the fly with the help of iterators.
After the final manual flush, the file can be stored, moved around, and since it makes use of binary search, values can be obtained in O(log-n) time whenever they're required (depending on the seeking speed of the drive). For example, an average seek takes around 0.3 ms, and a file containing a trillion values demands about 40 seeks (in the worse case), which translates to 12 ms.
This kind of "search and seek" is already being used by databases. But, the system is simply an unnecessary complication if you just want a table with a zillion rows and only two columns (a key and a value).
Dependencies
~57KB