#ctc #ken-lm #beam-search

ctclib-pp

A collection of utilities related to CTC, with the goal of being fast and highly flexible, with perplexity scores for KenLMs models

1 unstable release

0.2.0 Feb 23, 2023

#876 in Math


Used in ungoliant

MIT license

115KB
773 lines

ctclib

ctclib at crates.io .github/workflows/ci.yml

NOTE: This is currently under development.

A collection of utilities related to CTC, with the goal of being fast and highly flexible.

Features

  • CTC Decode
    • Greedy Decoder
    • Beam Search Decoder
    • Beam Search Decoder with KenLM
    • Beam Search Decoder with user-defined LM
    • Python bindings

Installation

ctclib depends on kpu/kenlm. You must install the following libraries as KenLM dependencies.

  • Boost
  • Eigen3

For example, if you are using Ubuntu (or some Debian based Linux), you can install them by running the following command:

apt install libboost-all-dev libeigen3-dev

Use ctclib from Rust

Currently, ctclib isn't available on crates.io, but you can use this as git dependencies.

[dependencies]
ctclib = { version = "*", git = "https://github.com/agatan/ctclib" }

Use ctclib from Python

ctclib provides python interfaces, named pyctclib. Currently, pyctclib isn't available on PyPI, but you can install this as git dependency. Ensure that you have installed cargo and libclang-dev.

pip install 'git+https://github.com/agatan/ctclib.git#egg=pyctclib&subdirectory=bindings/python'

Example

import pyctclib

decoder = pyctclib.BeamSearchDecoderWithKenLM(
    pyctclib.BeamSearchDecoderOptions(
      beam_size=100,
      beam_size_token=1000,
      beam_threshold=1,
      lm_weight=0.5,
    ),
    "/path/to/model.arpa",
    ["a", "b", "c", "_"],
)
decode.decode(log_probs)

# or you can use user-defined LM
# See pyctclib.LMProtocol

Dependencies

~1–9MB
~55K SLoC