5 releases

new 0.3.0 Nov 4, 2024
0.2.3 Jul 14, 2024
0.2.2-alpha May 28, 2024
0.2.0-alpha Apr 30, 2024
0.1.0 Apr 21, 2021

#623 in Parser implementations

MIT license

1.5MB
1.5K SLoC

molly—read xtc files, fast

molly logo

A reader for the Gromacs xtc file format implemented in pure Rust.

molly tries to decompress and read the minimal number of bytes from disk. To this end, the library features extensive selection methods for frames within a trajectory and atoms within each frame. This selection allows for some exciting optimizations—only the necessary positions are decompressed. Similarly, there are cases where only a limited number of compressed bytes are read in the first place. This is particularly powerful in applications where a subset of positions at the top-end of the frame is selected in a large trajectory. Such buffered reading can be very beneficial when disk read speed is particularly poor, such as over networked file storage.

For convenient use in existing analysis tools, molly exposes a set of bindings that allow access to its functions from Python.

molly can also be installed as a command line tool for shortening and filtering xtc files. It supports the 1995 and 2023 magic numbers.

NOTE: molly is in a pretty stable state and is used in the wild. Please do take care and verify the results. Blind trust in any tool is irresponsible.

For any questions, feel free to get in contact with me.

Installation

Command line application

cargo install molly

Usage

With the molly command, xtc files can be filtered and shortened. Selections can be made on frames as well as the atoms within the frames.

  • Frames can be selected with the -f/--frame-selection option, using start:stop:step ranges, which operate much like ranges in Python.
  • The first n atoms can be selected with the -a/--atom-selection option.

Here is a short showcase of possible uses.

# List all options.
molly --help

# Print a summary of a trajectory to standard out.
molly --info big.xtc

# Trajectories can be filtered in a number of ways. Here are a few combinations.
# Select the 100th to the 600th frame in steps of two. From those, store only the first 161 atoms.
molly big.xtc out.xtc --frame-selection 100:600:2 --atom-selection 161
molly big.xtc out.xtc -f 100:600:2 -a 161  # With shorter arguments.

# Reverse a selection. Here we use it to select the last frame.
molly big.xtc last.xtc --reverse-frame-selection --frame-selection :1
molly big.xtc last.xtc -Rf :1  # With shorter arguments.

# Reverse a trajectory.
molly big.xtc rev.xtc --reverse

# For any of these filtering commands, the frame times and steps can be written to standard out.
molly big.xtc rev_last_ten.xtc -rRf :10 --steps --times

As a library

To use molly in a Rust project, add this repository to the dependencies in your Cargo.toml. Find molly on crates.io.

As a Python module

cargo (which provides the Rust compiler) is required for building the Python bindings. (The stable toolchain is sufficient.)

To install the module, run the following command. It will automatically clone the repository and install the Python library from the correct directory.

pip3 install 'git+https://git.sr.ht/~ma3ke/molly#egg=molly&subdirectory=bindings/python'

Alternatively, clone the repository, go into the bindings directory, and install it from there using pip.

git clone https://git.sr.ht/~ma3ke/molly
cd molly/bindings/python

# Perhaps you want to use/create a virtual environment first.
python3 -m venv venv
source venv/bin/activate

pip3 install .

The examples

A number of useful example programs can be found in the examples directory. Some of these can be used to benchmark against other xtc reader implementations, or to create test files.

NOTE: I'm leaving these here for the moment, but ultimately, I will remove or fundamentally change many of these examples.

In order to access these, clone the repository and build them.

git clone https://git.sr.ht/~ma3ke/molly
cd molly
cargo build --release --examples
target/release/examples/<name> [options]

Or directly run them using

cargo run --release --example <name>

Tests

The library includes a number of unit tests of internal mechanisms and integration tests (including comparisons against the values produced by other libraries). Note that it is desirable to run the tests with the --release flag, since the debug builds run rather slow.

cargo test --release

Performance and benchmarks

Go ahead and run the provided benchmarks if you're interested!

cargo bench

Additionally, there is a couple of benchmark scripts lying around the repo. I may place them into a neat table at a later point. For now, some things are still subject to change. Though we can see the broad shape of the performance story for molly, this is not yet the time for hard promises.

It looks like molly is around 2× faster than xdrf (the widely-used Gromacs implementation), and around 4× faster than the chemfiles implementation.

For the buffered implementations this gap is slightly less pronounced. When disk I/O is factored out, buffered reading is around 20% slower than unbuffered reading. But over very large trajectories where only a subset of positions from the top of each frame is selected, the advantage is considerable.


Marieke Westendorp, 2024

Dependencies

~5.5MB
~151K SLoC