### 5 releases

Uses old Rust 2015

0.1.4 | Sep 21, 2017 |
---|---|

0.1.3 | Jul 13, 2016 |

0.1.2 | Jun 20, 2016 |

0.1.1 | Jun 13, 2016 |

0.1.0 | Jun 11, 2016 |

#**735** in Rust patterns

**94** downloads per month

Used in **2** crates
(via mayda_codec)

**MIT/Apache**

31KB

697 lines

# mayda

is a Rust library to compress integer arrays (all primitive integer
types are supported). The design favors decompression speed and the ability to
index the compressed array over the compression ratio, on the principle that
the runtime penalty for using compressed arrays should be as small as possible.`mayda`

This crate provides three variations on a single compression algorithm. The

type can decompress around six billion `Uniform`

s per second, or 24
GiB/s of decompressed integers, on a 2.6 GHz Intel Core i7-6700HQ processor
(see below for specifics). The `u32`

and `Monotone`

types decompress
at a little less than half the speed, but can have much better compression
ratios depending on the distribution of the integers. Overall performance is
comparable to the fastest (known) libraries in any language.`Unimodal`

Compiling

requires a `mayda`**nightly** compiler and CPU support for the SSE2
instruction set (any Intel or AMD processor manufactured after 2003). The
compiler version is further specified in

for reproducibility.
The basic approach is described in Zukowski2006 and Lemire2015.`./rust-toolchain`

### Documentation

The module documentation provides further examples and some more detailed descriptions of the algorithms involved.

### Usage

Add this to your

:`Cargo .toml`

`[``dependencies``]`
`mayda ``=` `"`0.2`"`

and this to your crate root:

`extern` `crate` mayda`;`

### Example: encoding and decoding

The

struct is recommended for general purpose integer compression.
Use the `Uniform`

trait to encode and decode the array.`Encode`

`extern` `crate` mayda`;`
`use` `mayda``::``{`Uniform`,` Encode`}``;`
`fn` `main``(``)`` ``{`
`//` Integers in some interval of length 255, require one byte per integer
`let` input`:` `Vec``<``u32``>` `=` `(``0``..``128``)``.``map``(``|``x``|` `(`x `*` `73``)` `%` `181` `+` `307``)``.``collect``(``)``;`
`let` `mut` uniform `=` `Uniform``::`new`(``)``;`
uniform`.``encode``(``&`input`)``.``unwrap``(``)``;`
`let` i_bytes `=` `std``::``mem``::`size_of_val`(`input`.``as_slice``(``)``)``;`
`let` u_bytes `=` `std``::``mem``::`size_of_val`(`uniform`.``storage``(``)``)``;`
`//` 128 bytes for encoded entries, 12 bytes of overhead
`assert_eq!``(`i_bytes`,` `512``)``;`
`assert_eq!``(`u_bytes`,` `140``)``;`
`let` output`:` `Vec``<``u32``>` `=` uniform`.``decode``(``)``;`
`assert_eq!``(`input`,` output`)``;`
`}`

### Example: indexing

Use the

and `Access`

traits to index the compressed array. This is
similar to `AccessInto`

, but returns a vector instead of a slice.`Index`

`extern` `crate` mayda`;`
`use` `mayda``::``{`Uniform`,` Encode`,` Access`}``;`
`fn` `main``(``)`` ``{`
`//` All primitive integer types supported
`let` input`:` `Vec``<``isize``>` `=` `(``-``64``..``64``)``.``collect``(``)``;`
`let` `mut` uniform `=` `Uniform``::`new`(``)``;`
uniform`.``encode``(``&`input`)``.``unwrap``(``)``;`
`let` val`:` `isize` `=` uniform`.``access``(``64``)``;`
`assert_eq!``(`val`,` `0``)``;`
`//` Vector is returned to give ownership to the caller
`let` range`:` `Vec``<``isize``>` `=` uniform`.``access``(``..``10``)``;`
`assert_eq!``(`range`,` `(``-``64``..``-``54``)``.``collect``::``<``Vec``<``_``>``>``(``)``)``;`
`}`

### Performance

Consider the following three vectors:

`extern` `crate` rand`;`
`use` `rand``::``distributions``::``{`IndependentSample`,` Range`,` Normal`}``;`
`let` `mut` rng `=` `rand``::`thread_rng`(``)``;`
`let` length`:` `usize` `=` `1024``;`
`//` `input1` contains random integers
`let` `mut` input1`:` `Vec``<``u32``>` `=` `Vec``::`with_capacity`(`length`)``;`
`let` range `=` `Range``::`new`(``0``,` `1024``)``;`
`for` `_` `in` `0``..`length `{`
input1`.``push``(`range`.``ind_sample``(``&``mut` rng`)``)``;`
`}`
`//` `input2` contains monotone increasing integers
`let` `mut` input2`:` `Vec``<``u32``>` `=` input1`.``clone``(``)``;`
input2`.``sort``(``)``;`
`//` `input3` contains Gaussian distributed integers with outliers
`let` `mut` input3`:` `Vec``<``u32``>` `=` `Vec``::`with_capacity`(`length`)``;`
`let` gaussian `=` `Normal``::`new`(``4086.``,` `128.``)``;`
`for` `_` `in` `0``..`length `{`
input3`.``push``(`gaussian`.``ind_sample``(``&``mut` rng`)` `as` `u32``)``;`
`}`
`let` index `=` `Range``::`new`(``0``,` length`)``;`
`let` outliers `=` `Range``::`new`(``0``,` `std``::``u32``::``MAX``)``;`
`for` `_` `in` `0``..``(`length `/` `16``)` `{`
input3`[`index`.``ind_sample``(``&``mut` rng`)``]` `=` outliers`.``ind_sample``(``&``mut` rng`)``;`
`}`

The performance of the

, `Uniform`

and `Monotone`

types for
these three vectors on a 2.6 GHz Intel Core i7-6700HQ processor is given below.
Encoding and decoding speeds using `Unimodal`

are reported in billions of
integers per second, time required to index the last entry is reported in
nanoseconds, and compression is reported as bits per integer. Native encoding
and decoding speed measurements perform a memcpy. The Shannon entropy is a
reasonable target for the bits per integer.`decode_into`

For

the Shannon entropy is 10.00. `input1`

is preferrable in every
respect for the general case.`Uniform`

Encode (BInt/s) | Decode (BInt/s) | Index (ns) | Bits/Int | |
---|---|---|---|---|

Uniform | 1.28 | 5.75 | 26 | 10.63 |

Monotone | 1.34 | 2.49 | 63 | 32.63 |

Unimodal | 0.21 | 2.01 | 59 | 11.16 |

Native | 26.26 | 26.26 | 0 | 32 |

For

the Shannon entropy is 10.00, but the additional structure is used
by `input2`

to improve compression.`Monotone`

Encode (BInt/s) | Decode (BInt/s) | Index (ns) | Bits/Int | |
---|---|---|---|---|

Uniform | 1.23 | 5.88 | 26 | 8.00 |

Monotone | 1.42 | 2.42 | 69 | 3.63 |

Unimodal | 0.24 | 2.07 | 27 | 8.19 |

Native | 26.26 | 26.26 | 0 | 32 |

For

the Shannon entropy is 12.46, but compression is difficult due to
the presence of outliers. `input3`

gives the most robust compression.`Unimodal`

Encode (BInt/s) | Decode (BInt/s) | Index (ns) | Bits/Int | |
---|---|---|---|---|

Uniform | 1.26 | 6.10 | 26 | 32.63 |

Monotone | 1.35 | 2.49 | 65 | 32.63 |

Unimodal | 0.18 | 1.67 | 61 | 12.50 |

Native | 26.26 | 26.26 | 0 | 32 |

## License

is licensed under either of`mayda`

- Apache License, Version 2.0, (LICENSE-APACHE or https://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or https://opensource.org/licenses/MIT)

at your option.

### Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

### Contributors

Francis Lalonde