5 unstable releases
|0.3.1||Jan 10, 2022|
|0.3.0||Oct 24, 2020|
|0.2.0||Sep 22, 2020|
|0.1.1||Aug 3, 2020|
|0.1.0||Jul 2, 2020|
#1125 in Command line utilities
Syncs folders into IPFS's mutable file system (the directory structure accessible through
ipfs files …).
Despite the name, FTP2MFS is capable of reading from FTP, HTTP directory listings, rsyncd, and the local filesystem.
Beware: ftp2mfs can be used to easily mirror several 100 GB in a few hours, and keep them in sync with upstream and on the local IPFS node. IPFS is not necessarily able to keep that much content available to the network from one node, see hyperprov for an explanation and a solution.
You need to create a configuration file that specifies what to sync, e.g. minimally:
# FTP server to read from source: ftp://ftp.jaist.jp/pub/Linux/ArchLinux/core/os/x86_64/ # Beware that many FTP servers don't do well with IPv6 (Or maybe the FTP library ftp2mfs uses doesn't). If in doubt, specify the server by IP address, or change /etc/gai.conf to prefer IPv4, or … # MFS folder to write to. The actual mirrored files will be written to $target/data target: /some-archlinux-core-mirror
Save the file as
ftp2mfs-cfg1 and run
ftp2mfs --config ftp2mfs-cfg1 (or
cargo run -- if you did not
cargo install this).
FTP2MFS will first materialize the files in some folder in
/temp, and make a copy at
$target once it successfully completes the sync.
If the operation fails, restarting with the same configuration file should continue the operations.
Note that source folder structures are fully explored before any copying/downloading is done, so it may take a while for any files to be copied.
All source types have their advantages and drawbacks. Building nice mirrors is difficult.
- (+) Is the only encryptable/secured source available.
- (-) There is no defined format for listing directories and directory structure is guessed from HTML: Links ending with / are treated as directories, others as files. Links that do not point to direct children of the current directory are ignored.
- (-) One HEAD request has to be sent for each file to see if it is up to date. This is somewhat inefficient
- (0) It might be nice to parse modification dates and file sizes from HTML directory listings
- (+) Proper standardized protocol for listing directories and transferring files
- (-) FTPs not supported (it is supported by the underlying library and would be easy to implement)
- (-) Many FTP servers don't support the MLSD command. To find out whether a file is up to date, FTP2MFS has to send one CWD, one MDTM, and one SIZE command per file sequentially. This is incredibly inefficient.
- (0) Support may be added for parsing some of the most common human directory listing formats
- (+) Efficient sync with full metadata
- (-) Self-baked implementation of a protocol with no standardization and little documentation
- (-) Not secured
- (+) The only implementation where syncing is trivial and thus probably correctly implemented
- (-) You'll have the files twice, once on your filesystem and once in your IPFS repo (No filestore/urlstore support)
- (0) Might be interesting in combination with FUSE
Full configuration format
# The server to copy from source: ftp://…|rsync://…|… # Reprieve period for which files will be kept in MFS after deletion on server reprieve: "1 month" # Ignore glob patterns when listing files on server (gitignore style) ignore: # Example: Whitelist only folders a and b - "/*" - "!a" - "!b" # setting --log-level=debug may help with debugging ignore rules # FTP credentials, defaults to anonymous, can be overwritten from command line user: anonymous pass: email@example.com # Many FTP servers ask you to provide your e-mail address as password # Path on MFS where the finalized sync result is kept target: /publish-me # Path on MFS where files are written to during sync - defaults to /temp/$hash_of_config workdir: /temp/foobar # Maximum render depth for self-referential symlinks (e.g. a -> .) max_symlink_cycle: 0