|new 0.12.1||Jan 12, 2021|
|0.12.0||Aug 13, 2020|
|0.11.9||Apr 1, 2020|
|0.11.7||Feb 19, 2020|
|0.4.2||Jun 2, 2019|
#198 in Command line utilities
Archival tool for scheduler job scripts and accompanying files.
Note that the master branch here may be running ahead of the latest release on crates.io. During development, we sometimes rely on dependencies that have not yet released a version with the features we use.
This version is what we test against in CI. We also test on
sarchive requires that the path to the scheduler's main spool directory is
specified. It also requires a
cluster (name) to be set.
For Slurm, the directory to watch is defined as the
StateSaveLocation in the slurm config.
sarchive offers various backends. The basic
writes a copy of the job scripts and associated files to a directory on a
mounted filesystem. We also have limited support for sending job information
to Elasticsearch or produce to a
Kafka topic. We briefly discuss these backends
Activated using the
file subcommand. Note that we do not support using
multiple subcommands (i.e., backends) at this moment.
For file archival,
sarchive requires the path to the archive's top
directory, i.e., where you want to store the backup scripts and accompanying
The archive can be further divided into subdirectories per
- year: YYYY, by provinding
- month: YYYYMM, by providing
- day: YYYYMMDD, by providing
--period=dailyEach of these directories are also created upon file archival if they do not exist. This allows for easily tarring old(er) directories you still wish to keep around, but probably no longer immediately need for user support.
sarchive --cluster huppel -s /var/spool/slurm file --archive /var/backups/slurm/job-archive
If you want to maintain the job script archive on another machine and/or make it easily searchable, use the Elasticsearch backend. The shipped data structure contains a timestamp along with the job script and potentially other relevant information (at the scheduler's discretion).
We do not yet support SSL/TLS or authentication with the ES backend.
sarchive --cluster huppel -s /var/spool/slurm elasticsearch --host myelastic.mydomain --index slurm-job-archive
You can ship the job scripts as messages to Kafka.
./sarchive --cluster huppel -l /var/log/sarchive.log -s /var/spool/slurm/ kafka --brokers mykafka.mydomain:9092 --topic slurm-job-archival
Support for SSL and SASL is available, through the
--sasl options. Both of these expect a comma-separated
list of options to pass to the underlying kafka library.
- Multithreaded, watching one dir per thread, so no need for hierarchical watching.
- Separate processing thread to ensure swift draining of the inotify event queues.
- Clean log rotation when SIGHUP is received.
- Experimental support for clean termination on receipt of SIGTERM or SIGINT, where job events that have already been seen are processed, to minimise potential loss when restarting the service.
- Output to a file in a hierarchical directory structure
- Output to Elasticsearch
- Output to Kafka
We provide a build script to generate an RPM using the cargo-rpm tool. You may tailor the spec
file (listed under the
.rpm directory) to fit your needs. The RPM includes a unit file so
sarchive can be started as a service by systemd. This file should also be changed to fit your
requirements and local configuration.