#testing #data #web #request


Fetch data files from a URL, but only if needed. Verify contents via SHA256.

4 releases

0.1.6 Oct 20, 2022
0.1.5 Oct 11, 2022
0.1.4 Jun 29, 2022
0.1.3 Jun 29, 2022

#118 in Science

Download history 22/week @ 2023-02-03 14/week @ 2023-02-10 80/week @ 2023-02-17 117/week @ 2023-02-24 12/week @ 2023-03-03 11/week @ 2023-03-10 26/week @ 2023-03-17 42/week @ 2023-03-24 45/week @ 2023-03-31 24/week @ 2023-04-07 15/week @ 2023-04-14 27/week @ 2023-04-21 314/week @ 2023-04-28 25/week @ 2023-05-05 12/week @ 2023-05-12 215/week @ 2023-05-19

567 downloads per month
Used in bed-reader


280 lines


github crates.io docs.rs CI

Fetch data files from a URL, but only if needed. Verify contents via SHA256.

Fetch-Data checks a local data directory and then downloads needed files. It always verifies the local files and downloaded files via a hash.

Fetch-Data makes it easy to download large and small sample files. For example, here we download a genomics file from GitHub (if it has not already been downloaded). We then print the size of the now local file.

use fetch_data::sample_file;

let path = sample_file("small.fam")?;
println!("{}", std::fs::metadata(path)?.len()); // Prints 85

# use fetch_data::FetchDataError; // '#' needed for doctest
# Ok::<(), FetchDataError>(())


  • Thread-safe -- allowing it to be used with Rust's multithreaded testing framework.
  • Inspired by Python's popular Pooch and our PySnpTools filecache module.
  • Avoids run-times such as Tokio (by using ureq to download files via blocking I/O).

Suggested Usage

You can set up FetchData many ways. Here are the steps -- followed by sample code -- for one set up.

  • Create a registry.txt file containing a whitespace-delimited list of files and their hashes. (This is the same format as Pooch. See section Registry Creation for tips on creating this file.)

  • As shown below, create a global static FetchData instance that reads your registry.txt file. Give it:

    • the URL root from which to download the files
    • an environment variable telling the local data directory in which to store the files
    • a qualifier, organization, and application -- Used to create a local data directory when the environment variable is not set. See crate ProjectsDir for details.
  • As shown below, define a public sample_file function that takes a file name and returns a Result containing the path to the downloaded file.

use fetch_data::{ctor, FetchData, FetchDataError};
use std::path::{Path, PathBuf};

static STATIC_FETCH_DATA: FetchData = FetchData::new(
    "BAR_APP_DATA_DIR", // env_key
    "com",              // qualifier
    "Foo Corp",         // organization
    "Bar App",          // application

/// Download a data file.
pub fn sample_file<P: AsRef<Path>>(path: P) -> Result<PathBuf, FetchDataError> {

You can now use your sample_file function to download your files as needed.

Registry Creation

You can create your registry.txt file many ways. Here are the steps -- followed by sample code -- for one way to create it.

  • Upload your data files to the Internet.
    • For example, Fetch-Data puts its sample data files in tests/data, so they upload to this GitHub folder. In GitHub, by looking at the raw view of a data file, we see the root URL for these files. In cargo.toml, we keep these data files out of our crate via exclude = ["tests/data/*"]
  • As shown below, write code that
    • Creates a FetchData instance without registry contents.
    • Lists the files in your data directory.
    • Calls the gen_registry_contents method on your list of files. This method will download the files, compute their hashes, and create a string of file names and hashes.
  • Print this string, then manually paste it into a file called registry.txt.
use fetch_data::{FetchData, dir_to_file_list};

let fetch_data = FetchData::new(
    "", // registry_contents ignored
    "BAR_APP_DATA_DIR", // env_key
    "com",              // qualifier
    "Foo Corp",         // organization
    "Bar App",          // application
let file_list = dir_to_file_list("tests/data")?;
let registry_contents = fetch_data.gen_registry_contents(file_list)?;

# use fetch_data::FetchDataError; // '#' needed for doctest
# Ok::<(), FetchDataError>(())


  • Feature requests and contributions are welcome.

  • Don't use our sample sample_file. Define your own sample_file that knows where to find your data files.

  • The FetchData instance need not be global and static. See FetchData::new for an example of a non-global instance.

  • Additional methods on the FetchData instance can fetch multiples files and can give the path to the local data directory.

  • You need not use a registry.txt file and FetchData instance. You can instead use the stand-alone function fetch to retrieve a single file with known URL, hash, and local path.

  • Additional stand-alone functions can download files and hash files.

  • Fetch-Data always does binary downloads to maintain consistent line endings across OSs.

  • The Bed-Reader genomics crate uses Fetch-Data.

  • To make FetchData work well as a static global, FetchData::new never fails. Instead, FetchData stores any error and returns it when the first call to fetch_file, etc., is made.

  • Debugging this crate under Windows can cause a "Oops! The debug adapter has terminated abnormally" exception. This is some kind of LLVM, Windows, NVIDIA(?) problem via ureq.

  • This crate follows Nine Rules for Elegant Rust Library APIs from Towards Data Science.


~108K SLoC