#data-analysis #verification #parse #command-line

bin+lib fastpasta

CLI for verifying or examining readout data from the ALICE detector

32 stable releases

1.22.0 May 23, 2024
1.21.0 Mar 10, 2024
1.20.0 Feb 10, 2024
1.18.0 Dec 28, 2023
1.13.0 Jul 26, 2023

#135 in Command line utilities

Download history 77/week @ 2024-09-23 1/week @ 2024-10-07 96/week @ 2024-12-02 205/week @ 2024-12-09

301 downloads per month

MIT/Apache

620KB
13K SLoC

fastPASTA

pipeline status coverage report Latest Release

docs.rs GitHub commit activity (branch) Crates.io

fast Protocol Analysis Scanner Tool for ALICE

For an exhaustive list of the data verification done via the check subcommand, see doc/checks_list.md.

Releases and associated changelogs can be found at releases or CHANGELOG.md.

See more example including advanced senarios in examples document.

Looking for more details? see the Documentation for developers section.

Purpose

To verify or view curated content of the scanned raw binary data from the ALICE detector at CERN.

Demo

demo-gif

Table of Contents

Quickstart

Prerequisite

The rust toolchain is required to compile the binary. Use the link to download a Windows installer. On macOS, Linux or other Unix-like OS simply run

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

and follow the on-screen instructions.

Install via cargo (comes with Rust)

cargo install fastpasta

Updating fastpasta simply requires rerunning cargo install fastpasta

Add shell completions

Generate completion script for bash/zsh/fish/powershell/elvish with:

fastpasta --generate-completions <SHELL> > path/to/your/completion/scripts/_fastpasta

See help, including examples of use

fastpasta -h

Examples of use


lz4 -d input.raw -c | fastpasta --filter-link 3 | fastpasta view rdh
#        ^^^^                      ^^^^                       ^^^^
#       INPUT       --->          FILTER          --->        VIEW
# Decompressing with `lz4`

Piping is often optional and avoiding it will improve performance. e.g. the following is equivalent to the previous example, but saves significant IO overhead, by using one less pipe.

lz4 -d input.raw -c | fastpasta --filter-link 3 view rdh

Enable all generic checks: sanity (stateless) AND running (stateful)

fastpasta input.raw --filter-link 0 check all

Enable all sanity checks and include checks applicable to ITS only

fastpasta input.raw check sanity its --filter-link 0

Read from file -> view ITS readout frames with less

Generate ITS readout frame view

fastpasta input.raw view its-readout-frames | less

View only readout frames from link #3

fastpasta input.raw view its-readout-frames -f 3 | less

Command flow

flowchart TD;
  start["fastpasta"] --> top_sub_cmd{"Check or view?"};

  top_sub_cmd -- "view" --> view_type{"Type of view"};
  view_type -- "rdh" --> view_rdh{{$ fastpasta view rdh}};
  view_type -- "its-readout-frames" --> view_ro_frames{{$ fastpasta view its-readout-frames}};
  view_type -- "its-readout-frames-data" --> view_ro_frames_data{{$ fastpasta view its-readout-frames-data}};

  top_sub_cmd -- "check" --> check_type{"Type of check"};

  check_type -- "sanity" --> check_sanity{"$ fastpasta check sanity
  or
  target system"};
  check_sanity -- "its" --> check_sanity_its{{$ fastpasta check sanity its}};

  check_type -- "all" --> check_all{"$ fastpasta check all
  or
  target system"};
  check_all -- "its" --> check_all_its{{$ fastpasta check all its}};
  check_all -- "its-stave" --> check_all_its_stave{{$ fastpasta check all its-stave}};

Customize checks

Config with custom checks

To perform very specific checks on the raw data, it is possible to supply a TOML file with the --checks-toml <PATH> option.

To get started use the --generate-checks-toml flag to generate a template that shows which custom checks are available, along with descriptions, and examples.

The generated TOML file will contain content like this:

# Number of Physics (PhT) Triggers expected in the data
# Example: 0, 10
#triggers_pht = None [ u32 ] # (Uncomment and set to enable this check)

To enable the check for 1 Physics Trigger in the raw data, edit the file like this:

# Number of Physics (PhT) Triggers expected in the data
# Example: 0, 10
triggers_pht = 1 # This data should contain 1 PhT trigger.

Finally run fastPASTA as usual e.g.

fastpasta check all its input-data.raw --checks-toml custom_checks.toml

Output comprehensive statistics (and input them for validation)

Output statistics

A large variety of statistics are collected during data analysis. These statistics can be written to file/stdout in JSON/TOML and could for example serve as input to a script that verifies these statistics further.

Example

Check everything applicable to ITS on stave level for the data in bin.raw, save stats as stats.json

fastpasta check all its-stave --output-stats stats.json --stats-format json bin.raw

Use statistics for data validation

The output statistics can also serve as the input to fastPASTA along with checks on some raw data, using the option --input-stats-file <file>. This will run a full comparison between the input stats and the stats collected during analysis, and output an error message for each mismatching value.

Example

Verify that analysis of bin.raw finds the same exact stats as listed in stats.json.

fastpasta check all its-stave --input-stats-file stats.json bin.raw

Even if you are not 100% sure that all the stats are correct, running one analysis and then using the output stats file as a reference in CI, will let you know if the data output ever changed in terms of these statistics, which could serve as a hint that something has gone wrong (or confirm a correct change in behaviour).

Error messages

Messages are formatted as follows:

MEMORY_OFFSET: [ERROR_CODE] ERROR_MESSAGE

Example of failed RDH sanity check

0xE450FFD: [E10] RDH sanity check failed: data_format = 255

Error codes

Error codes are unique and can between 2 and 4 digits. The first digit signifies a category for the error. The following is a list of error codes and their meaning, x is a placeholder for any number 0-9.

  • [Ex0] - Sanity check
  • [E1x] - RDH
  • [E3x] - IHW
  • [E4x] - TDH
  • [E5x] - TDT
  • [E6x] - DDW0
  • [E7x] - Data word (Even number: IB, Odd number: OB) E70 is sanity check for both IB/OB.
  • [E8x] - CDW
  • [E9xxx] - Errors from custom checks

Verbosity levels

  • 0: Errors
  • 1: Errors and warnings [default]
  • 2: Errors, warnings and info
  • 3: Errors, warnings, info and debug
  • 4: Errors, warnings, info, debug and trace

Running tests

Run the full test suite with:

cargo test

License

Apache 2.0 or MIT at your option.

Project status

Passively Maintained. There are no plans for new features, but the maintainer intends to respond to issues that get filed.

Benchmarks and comparisons

In the tables below fastPASTA is compared with rawdata-parser and decode.py in typical verification tasks. Hyperfine is used for benchmarking, with cache warmup.

Tool Command Mean ± σ [s] Min [s] Max [s]
fastPASTA fastpasta input.raw check all 0.195 ± 0.002 0.191 0.198
rawdata-parser ./rawdata-parser --skip-packet-counter-checks input.raw 1.638 ± 0.066 1.575 1.810
decode.py python3 decode.py -i 20522 -f input.raw --skip_data 94.218 ± 0.386 93.914 94.811
Tool Command Mean ± σ [s] Min [s] Max [s]
fastPASTA fastpasta input.raw check all 0.409 ± 0.004 0.402 0.417
rawdata-parser rawdata-parser input.raw 3.068 ± 0.028 3.012 3.105
decode.py Verifying multiple links simultaneously is not supported N/A N/A N/A
Tool Command Mean ± σ [s] Min [s] Max [s]
fastPASTA fastpasta input.raw check all its 0.106 ± 0.002 0.103 0.111
rawdata-parser Verifying payloads is not supported N/A N/A N/A
decode.py python3 decode.py -i 20522 -f input.raw 55.903 ± 0.571 54.561 56.837

Need more performance?

The primary release profile of fastPASTA is already very fast, but if you absolutely need 10-20% more speed, a faster build profile exists that utilizes the experimental rust nightly toolchain.

Background

The rust compiler rustc does not yet provide access to all the features that its backend LLVM has. But the experimental nightly rust toolchain allows passing flags directly to LLVM. fastPASTA includes configuration for a build profile release-nightly which utilizes LLVM to achieve more speed at the cost of compilation time and binary size. As of this writing, the stable channel of Rust does not have a way to pass compiler flags to the LLVM backend. The increased speed is mainly achieved through configuring a higher threshold for inlining, which will increase speed but also compilation time and binary size, and most crucially, cache pressure. The performance impact will be highly dependent on the machine fastPASTA runs on. Better/more CPU cache will lead to a higher performance gain. With >1 GB individual link data, the performance on one particular CERN machine running CentOS Stream 8, as measured by hyperfine increased by ~17%.

To install the nightly toolchain (and check your installation)

rustup toolchain install nightly
rustup run nightly rustc --version

Compile the optimized release-nightly experimental build profile

cargo +nightly build --profile release-nightly

Path to binary: /target/release-nightly/fastpasta

Documentation for developers

Just is used to run common commands in the project, install it with cargo install just and checkout the main justfile for the available recipes.

Running just in the project root will display the available (public) recipes.

To see how data is passed around at runtime, see data flow documentation.

For extensive documentation of source code see documentation or invoke cargo doc --open.

Dependencies

~15–25MB
~398K SLoC