5 releases
new 0.3.0 | Nov 4, 2024 |
---|---|
0.2.3 | Jul 14, 2024 |
0.2.2-alpha | May 28, 2024 |
0.2.0-alpha | Apr 30, 2024 |
0.1.0 |
|
#623 in Parser implementations
1.5MB
1.5K
SLoC
molly—read xtc
files, fast
A reader for the Gromacs xtc file format implemented in pure Rust.
molly tries to decompress and read the minimal number of bytes from disk. To this end, the library features extensive selection methods for frames within a trajectory and atoms within each frame. This selection allows for some exciting optimizations—only the necessary positions are decompressed. Similarly, there are cases where only a limited number of compressed bytes are read in the first place. This is particularly powerful in applications where a subset of positions at the top-end of the frame is selected in a large trajectory. Such buffered reading can be very beneficial when disk read speed is particularly poor, such as over networked file storage.
For convenient use in existing analysis tools, molly exposes a set of bindings that allow access to its functions from Python.
molly can also be installed as a command line tool for shortening and filtering xtc files. It supports the 1995 and 2023 magic numbers.
NOTE: molly is in a pretty stable state and is used in the wild. Please do take care and verify the results. Blind trust in any tool is irresponsible.
For any questions, feel free to get in contact with me.
Installation
Command line application
cargo install molly
Usage
With the molly command, xtc files can be filtered and shortened. Selections can be made on frames as well as the atoms within the frames.
- Frames can be selected with the
-f
/--frame-selection
option, usingstart:stop:step
ranges, which operate much like ranges in Python. - The first n atoms can be selected with the
-a
/--atom-selection
option.
Here is a short showcase of possible uses.
# List all options.
molly --help
# Print a summary of a trajectory to standard out.
molly --info big.xtc
# Trajectories can be filtered in a number of ways. Here are a few combinations.
# Select the 100th to the 600th frame in steps of two. From those, store only the first 161 atoms.
molly big.xtc out.xtc --frame-selection 100:600:2 --atom-selection 161
molly big.xtc out.xtc -f 100:600:2 -a 161 # With shorter arguments.
# Reverse a selection. Here we use it to select the last frame.
molly big.xtc last.xtc --reverse-frame-selection --frame-selection :1
molly big.xtc last.xtc -Rf :1 # With shorter arguments.
# Reverse a trajectory.
molly big.xtc rev.xtc --reverse
# For any of these filtering commands, the frame times and steps can be written to standard out.
molly big.xtc rev_last_ten.xtc -rRf :10 --steps --times
As a library
To use molly in a Rust project, add this repository to the dependencies in
your Cargo.toml
.
Find molly on crates.io.
As a Python module
cargo
(which provides the Rust compiler) is required for building the Python
bindings. (The stable
toolchain is sufficient.)
To install the module, run the following command. It will automatically clone the repository and install the Python library from the correct directory.
pip3 install 'git+https://git.sr.ht/~ma3ke/molly#egg=molly&subdirectory=bindings/python'
Alternatively, clone the repository, go into the bindings directory, and
install it from there using pip
.
git clone https://git.sr.ht/~ma3ke/molly
cd molly/bindings/python
# Perhaps you want to use/create a virtual environment first.
python3 -m venv venv
source venv/bin/activate
pip3 install .
The examples
A number of useful example programs can be found in the examples
directory.
Some of these can be used to benchmark against other xtc reader
implementations, or to create test files.
NOTE: I'm leaving these here for the moment, but ultimately, I will remove or fundamentally change many of these examples.
In order to access these, clone the repository and build them.
git clone https://git.sr.ht/~ma3ke/molly
cd molly
cargo build --release --examples
target/release/examples/<name> [options]
Or directly run them using
cargo run --release --example <name>
Tests
The library includes a number of unit tests of internal mechanisms and
integration tests (including comparisons against the values produced by other
libraries). Note that it is desirable to run the tests with the --release
flag, since the debug builds run rather slow.
cargo test --release
Performance and benchmarks
Go ahead and run the provided benchmarks if you're interested!
cargo bench
Additionally, there is a couple of benchmark scripts lying around the repo. I may place them into a neat table at a later point. For now, some things are still subject to change. Though we can see the broad shape of the performance story for molly, this is not yet the time for hard promises.
It looks like molly is around 2× faster than xdrf (the widely-used Gromacs implementation), and around 4× faster than the chemfiles implementation.
For the buffered implementations this gap is slightly less pronounced. When disk I/O is factored out, buffered reading is around 20% slower than unbuffered reading. But over very large trajectories where only a subset of positions from the top of each frame is selected, the advantage is considerable.
Marieke Westendorp, 2024
Dependencies
~5.5MB
~151K SLoC