#archive #reed-solomon #backup #ec-seq-box #data-recovery

bin+lib blkar

Multithreaded archiver offering bit rot protection and sector level recoverability

17 stable releases (5 major)

7.2.7 Nov 6, 2019
7.2.5 Jul 29, 2019
6.0.1 Apr 22, 2019
5.0.0 Apr 11, 2019
2.2.0 Dec 25, 2018

#20 in #reed-solomon

MIT license

755KB
19K SLoC

blockyarchive

Build Status codecov Crates dependency status Gitter chat

Documentation

Blockyarchive/blkar (pronounced "bloc-kar") is a multithreaded archiver written in Rust that offers bit rot protection, and makes it easier to recover archived data from failing storage devices.

Demo

asciicast

How does it work?

blkar encodes your data into SeqBox and EC-SeqBox archives. Both formats facilitate data recovery, but only EC-SeqBox provides data repair capability.

What are SeqBox and EC-SeqBox?

SeqBox is a single-file archive format designed by Marco Pontello that facilitates sector level data recovery for when file system metadata is corrupted/missing, while the archive itself still exists as a normal file on file system. Please visit the official SeqBox repo for the original implementation and technical details on this.

Error-correcting SeqBox (or EC-SeqBox for short) is an extended version of SeqBox developed by Darren Ldl for this project, introducing forward error correction via Reed-Solomon erasure code.

Blockyarchive/blkar was formerly known as rust-SeqBox/rsbx prior to renaming.

Features overall

  • Data recovery that does not depend on file system metadata (sector level recovery)
    • This allows data recovery even when data is fragmented and out of order
  • Supports error correction (via Reed-Solomon erasure code) for EC-SeqBox
  • Supports burst (sector) error resistance for EC-SeqBox
    • This is done via an interleaving block arrangement scheme. It is mainly to address the data repair limitation of the simple archive design.
    • More complex archive designs such as PAR2 can repair burst errors without any extra arrangement scheme, but they are also vastly more complex than EC-SeqBox
  • Multithreaded
    • A lot of operations involved in everyday workflow are written to take advantage of multi-core CPU to provide high performance
  • JSON mode
    • Outputs information in JSON format instead of human readable text, allowing easy integration with scripts

Limitations

  • Only a single file is supported for encoding as SeqBox and EC-SeqBox are both single-file archive formats
    • However, blkar may still be usable when you have multiple files, as blkar supports taking input from stdin during encoding, and also supports outputting to stdout during decoding
    • This means if you have an archiver that supports bundling and unbundling on the fly with pipes, like tar, you can combine the use of the archiver and blkar into one encoding and decoding step

Getting started

Installation

blkar is available via AUR, GitHub releases or cargo

cargo install blkar

Usage guides & screencasts & other resources

The wiki contains comprehensive guides and resources.

Goals and status

As blkar is to be used largely as a backup utility, security/robustness of the code will be prioritised over apparent performance.

This project has reached its intended feature completeness, so no active development for new features will occur. However, this project is still actively looked after, i.e. I will respond to PRs, issues, and emails, will consider feature requests, respond to bug reports quickly, and so on.

In other words, this is a completed project with respect to its original scope, but it is not abandoned.

Comparison to the original SeqBox implementation/design

Changelog

SBX format (EC-SeqBox is also specified in this document)

blkar specs

Contributions

Contributions are welcome. Note that by submitting contributions, you agree to license your work under the same license used by this project as stated in the LICENSE file.

Acknowledgement

I would like to thank Marco (the official SeqBox author) for discussing and clarifying aspects of his project, and also providing of test data during development of osbx. I would also like to thank him for his feedback on the numbering of the error correction enabled ECSBX versions (versions 17, 18, 19).

I would like to thank Ming for his feedback on the documentation, UX design, and several other general aspects of the osbx project, of which most of the designs are carried over to blkar, and also his further feedback on this project as well.

The design of the readable rate in progress report text is copied from Arch Linux pacman's progress bar design.

The design of block set interleaving arrangement in RS enabled versions is heavily inspired by Thanassis Tsiodras's design of RockFAT. The interleaving provides resistance against burst sector errors.

Donation

Note: Donation will NOT fuel development of new features. As mentioned above, this project is meant to be stable, well tested and well maintained, but normally I am not actively adding new features to it.

If blockyarchive has been useful to you, and you would like to donate to me for the development effort, you can donate through here.

License

Libcrc code

The crcccitt code is translated from the C implementation in libcrc and is under the same MIT License as used by libcrc and as stated in libcrc source code. The license text of the crcccitt.c is copied over to crc-ccitt/build.rs, crc-ccitt/src/lib.rs, build.rs and src/crc_ccitt.rs as well.

Official SeqBox code

The following files in tests folder copied from official SeqBox are under its license, which is MIT as of time of writing

  • tests/SeqBox/*

All remaining files are distributed under the MIT license as stated in the LICENSE file.

Dependencies

~10MB
~180K SLoC