1 unstable release

0.1.0 Feb 10, 2022

#491 in #machine-learning

Apache-2.0

26KB
587 lines

logreduce-tokenizer

Build:

python setup.py build
export PYTHONPATH=$(pwd)/build/lib

Bench:

python benches/bench.py

Test perf:

RUSTFLAGS="-C target-cpu=native" cargo build --release

Build CLI:

RUSTFLAGS="-C target-cpu=native" cargo build --example logreduce-tokenizer-cli --release

lib.rs:

This library provides a tokenizer function for the logreduce project.

The goal is to replace varying words with fixed tokens (e.g. sha256://... is converted to %HASH).

Dependencies

~5.5MB
~100K SLoC