#batch #iterator #swap #file #byte #temporary-files

swapvec

A Vector swapping to disk after exceeding a given length

9 unstable releases (3 breaking)

new 0.4.2 Nov 29, 2023
0.4.1 Nov 14, 2023
0.3.0 May 15, 2023
0.2.0 Apr 23, 2023
0.1.3 Apr 14, 2023

#177 in Compression

Download history 24/week @ 2023-08-11 48/week @ 2023-08-18 88/week @ 2023-08-25 67/week @ 2023-09-01 78/week @ 2023-09-08 192/week @ 2023-09-15 100/week @ 2023-09-22 81/week @ 2023-09-29 39/week @ 2023-10-06 72/week @ 2023-10-13 36/week @ 2023-10-20 76/week @ 2023-10-27 97/week @ 2023-11-03 111/week @ 2023-11-10 217/week @ 2023-11-17 324/week @ 2023-11-24

756 downloads per month
Used in cozo

MIT license

25KB
463 lines

SwapVec

A vector which swaps to disk when exceeding a certain length.

Useful if you do not want to use a queue, but first collecting all data and then consuming it.

Imagine multiple threads slowly producing giant vectors of data, passing it to a single consumer later on.

Or a CSV upload of multiple gigabytes to an HTTP server, in which you want to validate every line while uploading, without directly starting a Database transaction or keeping everything in memory.

Features

  • Multiplatform (Linux, Windows, MacOS)
  • Creates temporary file only after exceeding threshold
  • Works on T: Serialize + Deserialize + Clone
  • Temporary file removed even when terminating the program
  • Checksums to guarantee integrity
  • Can be moved across threads

Limitations

  • Due to potentially doing IO, most actions are wrapped in a Result
  • Currently, no "start swapping after n MiB" is implemented
    • Would need element wise space calculation due to heap elements (e.g. String)
  • Compression currently does not compress. It is there to keep the API stable.
  • No async support (yet)
  • When pushing elements or consuming iterators, SwapVec is "write only"
  • Only forwards iterations
    • Can be reset though

Examples

Basic Usage

use swapvec::SwapVec;
let iterator = (0..9).into_iter();
let mut much_data = SwapVec::default();
// Starts using disk for big iterators
much_data.consume(iterator).unwrap();
for value in much_data.into_iter() {
    println!("Read back: {}", value.unwrap());
}

Examples

Currently there is only one simple example, doing some basic operations and getting metrics like getting the batches/bytes written to file. . Run it with

cargo run --example demo

Dependencies

~0.7–12MB
~108K SLoC