#markov-chain #markov #text #chat-bot #light-weight

bin+lib wordmarkov

A simple but flexible Markov chain library, specifically for text sentences, which handles punctuation and whitespace

6 releases

0.1.4 Oct 21, 2022
0.1.3 Oct 21, 2022
0.1.2 Oct 21, 2022
0.1.1 Oct 21, 2022
0.1.0 Oct 21, 2022

#901 in Text processing

Custom license

33KB
778 lines

wordmarkov

:author: Gustavo Ramos Rehermann :toc: :numbered:

A Markov chain library which is tailored for sentences.

This library is a part of the Neurs Project.

Specifics

Unlike a general-purpose Markov chain, a Markov chain in WordMarkov retains information about punctuation and whitespace.

The same two words can have multiple edges if there are instances where they are separated differently. For example, "high priest" and "high-priest" will both result in the tokens "high" and "priest" being linked, but there will be two links each representing a kind of separation.

There are two special tokens, START and END, which also come into play. The Markov chain can be walked both forwards and backwards. Whenever walking in either direction, ideally, one of the special tokens will be reached under a finite amount of time (words walked).

License

For licensing information, see the Neurs Project main repository.


lib.rs:

  • The Markov chain code.
  • Primarily used by cnmc; can be reused by other projects.

Dependencies

~310KB