14 stable releases

1.2.3 Aug 25, 2023
1.2.2 Sep 22, 2022
1.2.1 Jun 20, 2022
1.0.8 May 31, 2022

#163 in Filesystem

GPL-3.0+

79KB
1.5K SLoC

Filespooler: CLI & Library for Sequential, Distributed, POSIX-style job queue processing

build docs

Introduction

Filespooler is a Unix-style tool that facilitates local or remote command execution, complete with stdin capture, with easy integration with various tools. I will decode what that means below. For now, here's a brief Filespooler feature list:

  • It can easily use tools such as S3, Dropbox, Syncthing, NNCP, ssh, UUCP, USB drives, CDs, etc. as transport.
    • Translation: you can use basically anything that is a filesystem as a transport
  • It can use arbitrary decoder command pipelines (eg, zcat, stdcat, gpg, age, etc) to pre-process stored packets.
  • It can send and receive packets by pipes.
  • Its storage format is simple on-disk files with locking.
  • It supports one-to-one and one-to-many configurations.
  • Locking is unnecessary when writing new jobs to the queue, and many arbitrary tools (eg, Syncthing, Dropbox, etc) can safely write directly to the queue without any assistance.
  • Queue processing is strictly ordered based on the order on the creation machine, even if job files are delivered out of order to the destination.
  • stdin can be piped into the job creation tool, and piped to a later executor at process time on a remote machine.
  • The file format is lightweight; less than 100 bytes overhead unless large extra parameters are given.
  • The queue format is lightweight; having 1000 different queues on a Raspberry Pi would be easy.
  • Processing is stream-based throughout; arbitrarily-large packets are fine and sizes in the TB range are no problem.
  • The Filespooler command, fspl, is extremely lightweight, consuming less than 10MB of RAM on x86_64.
  • Filespooler has extensive documentation.

Filespooler consists of a command-line tool (fspl) for interacting with queues. It also consists of a Rust library that is used by fspl. main.rs for fspl is just a few lines long.

Use Cases

Imagine for a moment that you want to send incremental backups from one machine to your backup server. You might run something like this:

tar --incremental -cSpf - ... | ssh backupsvr tar -xvSpf - -C /backups

That will work when all is good. But when the network between the two machines drops, now what? Probably data loss. What we want is a way to reliably execute things, in order, with reordering in case of out-of-order data. This turns out to be useful in many situations: Git repository syncing, backups, etc.

Now, say you do something like this:

tar --incremental -cSpf - ... | fspl prepare -s ~/statefile -i - > ~/syncedpath/fspl-`uuid`.fspl

At this point, a tool like Syncthing or Dropbox will sync this syncedpath to the ~/queue/jobs/ directory under the queue on the backup server. Now you can run this (from cron, systemd, etc) on the backup serer:

fspl queue-process -q ~/queue tar -- -xvSpf - -C /backups

Boom. Done.

queue-process will (by default) delete jobs that finish successfully. It will keep track of which jobs have been completed and process them in order.

Copyright

Copyright (C) 2022 John Goerzen <jgoerzen@complete.org>

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program.  If not, see <http://www.gnu.org/licenses/>.

Dependencies

~8–18MB
~267K SLoC