1 unstable release
0.4.0 | Jun 16, 2021 |
---|
#993 in Filesystem
Used in datman
160KB
3.5K
SLoC
山 (yama): deduplicated heap repository
note: this readme is not yet updated to reality…
yama
[-w|--with [user@host:]path] [--with-encrypted true|false]
Backup Profiles
Remotes
In yama.toml
, you can configure remotes:
[remote.bob]
encrypted = true
host = "bobmachine.xyz"
user = "bob"
path = "/home/bob/yama"
Subcommands
check
: Check repository for consistency
Verifies the full repository satisfies the following consistency constraints:
- all chunks have the correct hash
- all pointers have a valid structure, recursively
Usage: yama check [--gc]
The amount of space occupied and occupied by unused chunks is reported.
If --gc
is specified, unused chunks will be removed.
lsp
: List tree pointers
Usage: yama lsp
rmp
: Remove tree pointers
Usage: yama rmp pointer/path [--force]
If --force
is not specified and the pointer is depended upon by another, then deletion is aborted with an error.
store
: Store tree into repository
Usage: yama store [--dry-run] [ssh://user@host]/path/to/dir pointer/path [--exclusions path/to/exclusions.txt] [--differential pointer/parent]
The pointer must not exist and it will be created. If --differential
is specified with an existing parent pointer, then the diretory listing is specified as a differential list to the parent.
The intention of this is to reduce the size of the directory list.
Exclusion lists
Exclusion lists have pretty much the same format as .gitignore
, one glob per line of files to not include, relative to the tree root.
extract
: Extract file(s) from repository
Usage: yama extract [--dry-run] pointer/path[:path] [ssh://user@host]/path/to/local/dir[/]
If no path specified, extract root /. Trailing slash means that the file will be extracted as a child of the specified directory.
remote
: Run operations on a remote repository
Usage: yama remote ssh://user@host/path/to/repo <subcommand>
remote store
: Store local tree into remote repository
Usage is identical to yama store
except store path must be local.
remote extract
: Extract remote repository into local tree
Usage is identical to yama extract
except target path must be local.
slave
: Remote-controlled yama
Communicates over stdin/stdout to perform specified operations. Used when a yama command involves SSH.
Repository Storage Details
Pointers are stored in pointers.lmdb
and chunks are stored in chunks.lmdb
.
It is expected that exclusion files will be kept in the same directory with the repository, if they are to be used
on a recurring basis.
Chunks are compressed with zstd
. It must first be trained and a training dictionary placed in repo root/zstd.dict
.
This dictionary file must not be lost or altered after chunks have been made using it. Doing so will void the integrity of the entire repository.
Chunks are hashed with BLAKE256, and chunks will have their xxHash calculated before being deduplicated away. (Collision being detected will result in abortion of the backup. It is expected to never happen but nevertheless we may not be sure.)
Remote Protocol Details
- Compression is performed on the host where the data resides.
- Only required chunks are compressed and diffused across the SSH connection.
- There needs to be some mechanism to offer, decline and accept chunks, without buffers overflowing and bringing hosts down.
Processor Details
Other notes
zstd --train FILEs -o zstd.dict
- Candidate size:
find ~/Programming -size -4k -size +64c -type f -exec grep -Iq . {} \; -printf "%s\n" | jq -s 'add'
- Want to sample:
find ~/Programming -size -4k -size +64c -type f -exec grep -Iq . {} \; -exec cp {} -t /tmp/d/ \;
du -sh
find > file.list
wc -l < file.list
→ gives a № linesshuf -n 4242 file.list | xargs -x zstd --train -o zstd.dict
for 4242 files. Chokes if it receives a filename with a space, just re-run until you get a working set.
Dependencies
~63MB
~847K SLoC