6 releases (3 breaking)

0.9.0 Feb 19, 2024
0.8.2 Sep 26, 2023
0.7.0 Sep 6, 2023
0.6.1 Aug 3, 2023

#239 in Science

23 downloads per month

Apache-2.0 and CC-PDDC licenses

210KB
4.5K SLoC

Rust 4K SLoC // 0.0% comments Python 388 SLoC // 0.1% comments Shell 105 SLoC // 0.0% comments Jinja2 91 SLoC

Reductionist

This project implements simple reductions on S3 objects containing numeric binary data. By implementing these reductions in the storage system the volume of data that needs to be transferred to the end user is vastly reduced, leading to faster computations.

The work is funded by the ExCALIBUR project and is done in collaboration with the University of Reading.

Documentation for the Reductionist application is hosted on GitHub. Documentation for the source code is available on docs.rs.

This is a performant implementation of the active storage server. The original Python functional prototype is available here.

Note: The original S3 Active Storage project was renamed to Reductionist, to avoid confusion due to overuse of the term Active Storage.

Features

Reductionist provides the following features:

  • HTTP(S) API with JSON request data
  • Access to data stored in S3-compatible storage
  • Basic numerical operations on multi-dimensional arrays (count, min, max, select, sum)
  • Perform calculations on a selection/slice of an array
  • Perform calculations allowing for missing data
  • Compressed data (GZip, Zlib)
  • Filtered data (byte shuffle)
  • Data with non-native byte order (endianness)
  • Server resource (CPU, memory, files) management
  • Prometheus metrics
  • Tracing with an option to send data to Jaeger
  • Ansible-based containerised deployment
  • PyActiveStorage is a Python library which performs reductions on numerical data in data sources such as netCDF4. It has support for delegating computation to Reductionist when the data is stored in an S3-compatible object store.

Contributing

See the contributor guide for information about contributing to Reductionist.

License

This project is licensed under the Apache-2.0 License.

Dependencies

~48–65MB
~1M SLoC