### 11 releases

0.13.6 | Feb 19, 2024 |
---|---|

0.13.5 | Sep 18, 2023 |

0.13.3 | Aug 21, 2023 |

0.13.1 | Dec 6, 2022 |

0.11.5 | Jun 21, 2022 |

#**652** in Algorithms

**MIT/Apache**

1.5MB

**32K**
SLoC

# Rustfst

#### Rust

#### Python

Rust implementation of Weighted Finite States Transducers.

Rustfst is a library for constructing, combining, optimizing, and searching weighted finite-state transducers (FSTs). Weighted finite-state transducers are automata where each transition has an input label, an output label, and a weight. The more familiar finite-state acceptor is represented as a transducer with each transition's input and output label equal. Finite-state acceptors are used to represent sets of strings (specifically, regular or rational sets); finite-state transducers are used to represent binary relations between pairs of strings (specifically, rational transductions). The weights can be used to represent the cost of taking a particular transition.

FSTs have key applications in speech recognition and synthesis, machine translation, optical character recognition, pattern matching, string processing, machine learning, information extraction and retrieval among others. Often a weighted transducer is used to represent a probabilistic model (e.g., an n-gram model, pronunciation model). FSTs can be optimized by determinization and minimization, models can be applied to hypothesis sets (also represented as automata) or cascaded by finite-state composition, and the best results can be selected by shortest-path algorithms.

## Overview

For a basic example see the section below.

Some simple and commonly encountered types of FSTs can be easily
created with the macro [

] or the functions
`fst`

and
`acceptor`

.`transducer`

For more complex cases you will likely start with the

type, which will be imported
in the `VectorFst`

along with most everything else you need.
`prelude`

corresponds
directly to the OpenFST `VectorFst <TropicalWeight>`

`StdVectorFst`

, and can be used to load
its files using `read`

or
`read_text`

.Because "iteration" over an FST can mean many different things,
there are a variety of different iterators. To iterate over state
IDs you may use

, while to
iterate over transitions out of a state, you may use
`states_iter`

. Since it is common to
iterate over both, this can be done using
`get_trs`

or
`fst_iter`

. It
is also very common to iterate over paths accepted by an FST,
which can be done with
`fst_into_iter`

, and as a convenience
for generating text,
`paths_iter`

.
Alternately, in the case of a linear FST, you may retrieve the
only possible path with
`string_paths_iter`

.`decode_linear_fst`

Note that iterating over paths is not the same thing as finding
the *shortest* path or paths, which is done with

(for a single path)
or
`shortest_path`

(for N-shortest paths).`shortest_path_with_config`

For the complete list of algorithms, see the

module.`algorithms`

You may now be wondering, especially if you have previously used
such linguist-friendly tools as
pyfoma, "what if I just want
to *transduce some text*???" The unfriendly answer is that
rustfst is a somewhat lower-level library, designed for
implementing things like speech recognizers. The somewhat more
helpful answer is that you would do this by constructing an

for your input, which you will
`acceptor`

with a
`compose`

, then
`transducer`

the result to itsoutput, and finally
iterate over the paths in
the resulting FST.`project`

## References

Implementation heavily inspired from Mehryar Mohri's, Cyril Allauzen's and Michael Riley's work :

- Weighted automata algorithms
- The design principles of a weighted finite-state transducer library
- OpenFst: A general and efficient weighted finite-state transducer library
- Weighted finite-state transducers in speech recognition

The API closely resembles that of OpenFST, with some
simplifications and changes to make it more idiomatic in Rust, notably
the use of

instead of `Tr`

. See Differences fromOpenFST for more information.`Arc`

## Example

`use` `anyhow``::`Result`;`
`use` `rustfst``::``prelude``::``*``;`
`use` `rustfst``::``algorithms``::``determinize``::``{`DeterminizeType`,` determinize`}``;`
`use` `rustfst``::``algorithms``::``rm_epsilon``::`rm_epsilon`;`
`fn` `main``(``)`` ``->` `Result``<``(``)``>` `{`
`//` Creates a empty wFST
`let` `mut` fst `=` `VectorFst``::``<`TropicalWeight`>``::`new`(``)``;`
`//` Add some states
`let` s0 `=` fst`.``add_state``(``)``;`
`let` s1 `=` fst`.``add_state``(``)``;`
`let` s2 `=` fst`.``add_state``(``)``;`
`//` Set s0 as the start state
fst`.``set_start``(`s0`)``?``;`
`//` Add a transition from s0 to s1
fst`.``add_tr``(`s0`,` `Tr``::`new`(``3``,` `5``,` `10.``0``,` s1`)``)``?``;`
`//` Add a transition from s0 to s2
fst`.``add_tr``(`s0`,` `Tr``::`new`(``5``,` `7``,` `18.``0``,` s2`)``)``?``;`
`//` Set s1 and s2 as final states
fst`.``set_final``(`s1`,` `31.``0``)``?``;`
fst`.``set_final``(`s2`,` `45.``0``)``?``;`
`//` Iter over all the paths in the wFST
`for` p `in` fst`.``paths_iter``(``)` `{`
`println!``(``"``{:?}``"``,` p`)``;`
`}`
`//` A lot of operations are available to modify/optimize the FST.
`//` Here are a few examples :
`//` - Remove useless states.
`connect``(``&``mut` fst`)``?``;`
`//` - Optimize the FST by merging states with the same behaviour.
`minimize``(``&``mut` fst`)``?``;`
`//` - Copy all the input labels in the output.
`project``(``&``mut` fst`,` `ProjectType``::`ProjectInput`)``;`
`//` - Remove epsilon transitions.
`rm_epsilon``(``&``mut` fst`)``?``;`
`//` - Compute an equivalent FST but deterministic.
fst `=` `determinize``(``&`fst`)``?``;`
`Ok``(``(``)``)`
`}`

## Differences from OpenFST

Here is a non-exhaustive list of ways in which Rustfst's API differs from OpenFST:

- The default epsilon symbol is

and not`<``eps``>`

.`<``epsilon``>` - Functions and methods follow Rust naming conventions,
e.g.

rather than`add_state`

, but are otherwise mostly equivalent, except that:`AddState` - Transitions are called

and not`Tr`

, because`Arc`

has a rather different and well-established meaning in Rust, and rustfst uses it (`Arc`

, that is) to reference-count symbol tables. All associated functions also use`std``::``sync`Arc`::`

.`tr` - Final states are not indicated by a final weight of

. You can test for finality using`zero`

, and`is_final`

returns an`final_weight`

. This requires some care when converting OpenFST code.`Option` - Transitions can be accessed directly as a slice rather than requiring an iterator.
- Semiring operations are expressed as plain old methods rather
than strange C++ things. So write

rather than`w1``.``plus``(`w2`)`

, for instance.`Plus``(`w1`,`w2`)` - Weights have in-place operations for ⊕
(

) and ⊗ (`plus_assign`

).`times_assign` - Most of the type aliases (which would be trait aliases in Rust) such
as

,`StdArc`

, and so forth, are missing, but type inference allows us to avoid explicit type arguments in most cases, such as when calling`StdFst`

, for instance.`Tr`new`::` - State IDs are unsigned, with

used for a missing value. They are also 32 bits by default (presumably, 4 billion states is enough for most applications). This means you must take care to cast them to`NO_STATE_ID`

when using them as indices, and vice-versa, preferably checking for overflows`usize` - Symbol IDs are also unsigned and 32-bits, with

used for a missing value.`NO_LABEL` - Floating-point weights are not generic, so are always single-precision.

## Benchmark with OpenFST

I did a benchmark some time ago on almost every linear fst algorithm and compared the results with

. You can find the results here :`OpenFst`

Spoiler alert:

is faster on all those algorithms 😅`Rustfst`

## Documentation

The documentation of the last released version is available here : https://docs.rs/rustfst

## Release process

- Use the script

to update the version of every package.`update_version``.`sh - Push
- Push a new tag with the prefix
`rustfst-v`

Example :

`./update_version.sh`` 0.9.1-alpha.6`
`git`` commit`` -`am `"`Release 0.9.1-alpha.6`"`
`git`` push`
`git`` tag`` -`a rustfst-v0.9.1-alpha.6` -`m `"`Release rustfst 0.9.1-alpha.6`"`
`git`` push`` --`tags

Optionally, if this is a major release, create a GitHub release in the UI.

## Projects contained in this repository

This repository contains two main projects:

is the Rust re-implementation.`rustfst`

is the python binding of`rustfst-python`

.`rustfst`

## License

Licensed under either of

- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT) or http://opensource.org/licenses/MIT)

at your option.

## Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

#### Dependencies

~3.5–4.5MB

~89K SLoC