4 releases

0.1.4 Sep 28, 2023
0.1.2 Mar 19, 2021
0.1.1 Aug 28, 2019
0.1.0 May 24, 2019

#141 in Command line utilities

GPL-3.0-or-later

35KB
592 lines

unionfarm

This is a small utility for managing symlink farms. It takes a "farm" directory and any number of "data" directories, and creates (or updates) the union (or overlay) of the data directories in the farm directory by placing symlinks to data directories.

It is similar to

  • union mounts (overlay/overlayfs) -- but works without system privileges; it is not live, but can then again err out on duplicate files rather than picking the highest ranking

  • (x)stow -- but that is buggy with symlink farms as sources (cf. https://sourceforge.net/p/xstow/bugs/8/). Unlike stow, this always takes a full list of to-be-installed data directories (called "packages" in stow), removes files that have vanished from the sources, and errs out on files not associated with any source (or shows warnings about them, depending on command line flags).

Example

$ tree my-photos
my-photos
├── 2018/
   └── Rome/
       └── ...
└── 2019/
    └── Helsinki/
        └── DSCN2305.jpg

Assume you have a collection of photos as above, and want to see them overlaid with a friend's photos:

$ tree ~friend/photos
/home/friend/photos
├── 2018/
   └── Amsterdam/
       └── ...
└── 2019/
    └── Helsinki/
        └── DSC_0815.jpg

With unionfarm, you can create a shared view on them:

$ unionfarm all-photos my-photos /tmp/other-user-photos
$ tree all-photos
all-photos
├── 2018/
   ├── Amsterdam -> /home/friend/photos/2018/Amsterdam/
│   └── Rome -> ../../my-photos/2018/Rome/
└── 2019/
    └── Helsinki/
        ├── DSC_0815.jpg -> /home/friend/photos/2019/Helsinki/DSC_0815.jpg
        └── DSCN2305.jpg -> ../../../my-photos/2019/Helsinki/DSCN2305.jpg

Implementation

This tries to be

  • correct,
  • easy to maintain, and
  • efficient

in that order. Correctness means not deleting anything that can not plausibly have been an entry in a removed data source (ie. it only removes symlinks whose targets end in their own path relative to the farm), and creating as few symlinks as possible (even when there used to be an a/ in two data sources and one data sources' gets removed, a/ at the farm is removed and turned into a symlink).

Ease of maintenance over efficiency means that no efforts are made to to use file system mechanism not yet in the standard library, like using the statx system call, passing around file descriptors to allow openat (if that'd work at all in the presence of large directories), or parallelizing execution.

This program does not recurse on stack, but keeps a to-do list of unfinished paths. It keeps the number of file system access operations to a reasonable minimum to fullfil its task.

When the program runs to completion successfully and neither farm nor data were changed since its start, a second invocation will not cause any file system writes. A second invocation may cause file system writes if the first was not successful (eg. aborted due to the presence of an unidentified file in the farm), as the sequence of file operations is not necessarily deterministic. (The implementation may be changed in the future to spool up such errors and only abort when nothing else is left to do).

The program provides detailed logging on demand, and meaningful messages on errors (including ones stemming from changes to the file system during its run time) rather than plainly panicking on them.

Caveats

  • The farm directory can not have a trailing slash. It's usually not an issue, but a slash breaks the program's ability to make the farm point directly to the single present data directory, or remove such a symlink.

This project is published on Codeberg at https://codeberg.org/chrysn/unionfarm; that also hosts an issue tracker and automated tests.

It was written by chrysn chrysn@fsfe.org, and is published under the terms of GPLv3+.

Dependencies

~4.5–6MB
~108K SLoC