1 unstable release

0.2.0-alpha Apr 30, 2024
0.1.0 Apr 21, 2021

#354 in Parser implementations

Download history 7/week @ 2024-02-19 9/week @ 2024-02-26 21/week @ 2024-04-01 126/week @ 2024-04-29

127 downloads per month

MIT license

1.5MB
1.5K SLoC

molly—read xtc files, fast

WARNING: This library is unfinished and has not been tested to a sufficient degree.

Please don't use it for any critical analyses. This repository cannot be considered as public, yet. Please inquire whether its address or contents may be shared on a case-per-case basis until otherwise noted.

This library is currently purposefully without a license to prevent dependence on its contents.

NOTE: For those who want to try to use this library, please point to the alpha branch. The alpha branch will be the more stable testing ground while still allowing some more bleeding-edge work to happen on main. Pinky-promise that alpha has no breaking changes in the short term.

For any questions, feel free to get in contact with me.

A reader for the Gromacs xtc file format implemented in pure Rust.

molly tries to decompress and read the minimal number of bytes. To this end, the library features extensive selection methods for frames within a trajectory and atoms within each frame. This selection allows for some exciting optimizations. Only the necessary positions are decompressed. Similarly, there are circumstances under which only a limited number of compressed bytes are read in the first place. This is particularly powerful in applications where a subset of positions at the top-end of the frame is selected in a large trajectory. Such buffered reading can be very beneficial when disk read speed is particularly poor, such as over networked file storage.

For convenient use in existing analysis tools, molly exposes a set of bindings that allow access to its functions from Python.

molly can also be installed as a command line tool for shortening and filtering xtc files.

Installation

Command line application

cargo install --git 'https://git.sr.ht/~ma3ke/molly'

Usage

With the molly command, xtc files can be filtered and shortened. Selections can be made on frames as well as the atoms within the frames.

  • Frames can be selected with the -f/--frame-selection option, using start:stop:step ranges, which operate much like ranges in Python.
  • The first n atoms can be selected with the -a/--atom-selection option.

Here is a short showcase of possible uses.

# List all options.
molly --help

# Print a summary of a trajectory to standard out.
molly --info big.xtc

# Trajectories can be filtered in a number of ways. Here are a few combinations.
# Select the 100th to the 600th frame in steps of two. From those, store only the first 161 atoms.
molly big.xtc out.xtc --frame-selection 100:600:2 --atom-selection 161
molly big.xtc out.xtc -f 100:600:2 -a 161  # With shorter arguments.

# Reverse a selection. Here we use it to select the last frame.
molly big.xtc last.xtc --reverse-frame-selection --frame-selection :1
molly big.xtc last.xtc -Rf :1  # With shorter arguments.

# Reverse a trajectory.
molly big.xtc rev.xtc --reverse

# For any of these filtering commands, the frame times and steps can be written to standard out.
molly big.xtc rev_last_ten.xtc -rRf :10 --steps --times

As a library

To use molly in a Rust project, add this repository to the dependencies in your Cargo.toml.

[dependencies]
molly = { git = "https://git.sr.ht/~ma3ke/molly" }

As a Python module

cargo (which provides the Rust compiler) is required for building the Python bindings. (The stable toolchain is sufficient.)

To install the module, run the following command. It will automatically clone the repository, switch to the alpha branch, and install the Python library from the correct directory.

pip3 install 'git+https://git.sr.ht/~ma3ke/molly@alpha#egg=molly&subdirectory=bindings/python'

Alternatively, clone the repository, go into the bindings directory, and install it from there using pip.

git clone https://git.sr.ht/~ma3ke/molly
cd molly/bindings/python

# Perhaps you want to use/create a virtual environment first.
python3 -m venv venv
source venv/bin/activate

pip3 install .

The examples

A number of useful example programs can be found in the examples directory. Some of these can be used to benchmark against other xtc reader implementations, or to create test files.

NOTE: I'm leaving these here for the moment, but ultimately, I will remove or fundamentally change many of these examples.

In order to access these, clone the repository and build them.

git clone https://git.sr.ht/~ma3ke/molly
cd molly
cargo build --release --examples
target/release/examples/<name> [options]

Or directly run them using

cargo run --release --example <name>

Tests

The library includes a number of unit tests of internal mechanisms and integration tests (including comparisons against the values produced by other libraries). Note that it is desirable to run the tests with the --release flag, since the debug builds run rather slow.

cargo test --release

Performance and benchmarks

Go ahead and run the provided benchmarks if you're interested!

cargo bench

Additionally, there is a couple of benchmark scripts lying around the repo. I may place them into a neat table at a later point. For now, some things are still subject to change. Though we can see the broad shape of the performance story for molly, this is not yet the time for hard promises.

It looks like molly is around 2× faster than xdrf (the widely-used Gromacs implementation), and around 4× faster than the chemfiles implementation.

For the buffered implementations this gap is slightly less pronounced. When disk I/O is factored out, buffered reading is around 20% slower than unbuffered reading. But over very large trajectories where only a subset of positions from the top of each frame is selected, the advantage is considerable.


Marieke Westendorp, 2024

Dependencies