### 6 releases

0.2.3 | May 15, 2021 |
---|---|

0.2.2 | Feb 27, 2021 |

0.1.1 | Feb 11, 2021 |

0.1.0 | Jan 30, 2021 |

#**1129** in Algorithms

Used in rscompress

**MIT**license

27KB

443 lines

# rscompress

A compression library in Rust with focus on scientific data.

## Disclaimer

This is a rewrite and merge of several compression algorithms developed during my time as a phd student:

- https://github.com/ucyo/huffman
- https://github.com/ucyo/pzip-cli
- https://github.com/ucyo/pzip-bwt
- https://github.com/ucyo/pzip-redux
- https://github.com/ucyo/pzip-huffman
- https://github.com/ucyo/pzip
- https://github.com/ucyo/rust-compress
- https://github.com/ucyo/adaptive-lossy-compression
- https://github.com/ucyo/information-spaces
- https://github.com/ucyo/cframework
- https://github.com/ucyo/xor-and-residual-calculation
- https://github.com/ucyo/climate-data-analysis

The dissertation can be downloaded from https://doi.org/10.5445/IR/1000105055

## Architecture

The library is split into one base and four supporting libraries. The base library orchestrates the supporting libraries. All compression algorithms follow the same basic structure:

- Decorrelate data using transformations
- Approximate the data, if lossy compression is needed
- Code the data

Additionally, check if each step executed as expected.

` ``+``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``+` lossless `+``-``-``-``-``-``-``-``-``-``-``+`
`|` `|` `|` `|`
Start `+``-``-``-``-``-``->` `|` Transformation `|` `+``-``-``-``-``-``-``-``-``-``-``-``->` `|` Coding `|` `+``-``-``-``-``-``->` End
`|` `|` `|` `|`
`+``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``+` `+``-``-``-``-``-``-``-``-``-``-``+`
`+` `^`
`|` `|`
`|` lossy `|`
`|` `|`
v `|`
`|`
`+``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``+` `|`
`|` `|` `|`
`|` Approximation `|` `+``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``+`
`|` `|`
`+``-``-``-``-``-``-``-``-``-``-``-``-``-``-``-``+`

This library will follow the same principles.

### Transformations

Transformations are algorithms which represent the same information using a different alphabet.
Good transformation algorithms eliminate redundant information in the data.
A mathematical function can be seen as a transformation of a series of data.
The series

can expressed as `1` `1` `2` `3` `5` `8` `13` `21` `..`

.
We mapped the information represented in alphabet A (integers) to an alphabet B (letters + integers) which is more compact.
It is important to note that all transformations must have two properties:`f``(`x`)` `=` `f``(`x`-``1``)` `+` `f``(`x`-``2``)`

- Applying a transformation algorithm to data, does not loose information.
- All transformation algorithms are reversible, such that the original representation can be reconstructed from the new alphabet.

### Approximations

Approximations are algorithms which loose information for the sake of better compression.
Given a threshold

(this can be absolute or relative), the algorithm maps the data from alphabet A to B with an information lose within the expected threshold.
An example for an approximation is the `theta`

operator known from primary school e.g. `~``=`

.
Approximations have the following properties:`1``/``3` `~``=` `0.``3`

- Applying an approximation algorithm to data, results in information loss.
- Approximation algorithms are not reversible.
- The information loss is guaranteed to be within the threshold
`theta`

### Codings

Codings are algorithms where the actual compression happens. The information is being saved on disk as compact as possible. Examples are Huffman or Arithmetic coding.

### Checksums

Checksums are algorithms to check the integrity of the data at each step e.g. Adler-32.

#### Dependencies

~255–530KB

~10K SLoC