#graph #markov-chain #markov #probability #cli-applications #generation #procedural

bin+lib markovgen

A library for building markov chain graphs from text datasets and performantly generating text sequences by traversing them, includes an accompanying CLI application

2 unstable releases

0.2.0 May 15, 2024
0.1.0 May 15, 2024

#249 in Math

Download history 182/week @ 2024-05-09 107/week @ 2024-05-16

289 downloads per month

MIT license

32KB
519 lines

markovgen

A library for building markov chain graphs from text datasets and performantly generating text sequences by traversing them.

Features

  • Simple API for building and traversing graphs
  • Configurable minimum sequence length
  • An example CLI application (markovcli) that supports building graphs from datasets and writing them to the disk, as well as sampling such graphs with customizable sequence length.
    • Capable of generating >2.5 million names per second with default settings (and cli_no_print feature set to avoid IO overhead) on my machine from the first_names benchmark dataset (see benches/)
    • Try it using cargo run -r -F serde --bin markovcli

Example

src/bin/example.rs:

use std::sync::Arc;

use markovgen::*;

const NAME_DATASET: &str = "Tim\nTom\nThomas\nNathan\nNina\nTiara\nTyra\nTyrone";

const SEQUENCE_START: char = '\x01';
const SEQUENCE_END: char = '\x02';

fn main() {
    let mut constructor = GraphConstructor::new();
    NAME_DATASET.lines().for_each(|l| {
        l.chars().fold(SEQUENCE_START, |acc, x| {
            constructor.register_sequence(acc, x);
            x
        });

        constructor.register_sequence(l.chars().last().unwrap(), SEQUENCE_END);
    });
    let graph = Arc::new(constructor.construct());

    let mut stepper = GraphStepper::new(
        graph,
        GraphStepperConfiguration {
            start_char: Some(SEQUENCE_START),
            min_length: Some(3),
        },
    )
    .unwrap();

    // Step until reaching a "dead end" vertex, with a timeout of 16 steps.
    println!("{}", stepper.step_until_end_state(16).unwrap());
}

Running this should yield something like:

$ cargo run --bin example
Ninathom

Plans for 1.0

  • This was one of my first Rust projects, which I just cleaned up a little. I'll probably be changing the API to make it a little more ergonomic before the 1.0.0 release
  • Proper multi-threading support (I think just cloning GraphSteppers and using them in different tasks should already work, but haven't actually tried it)
  • Generic implementation to allow for String and char vertices (currently only chars are supported since this fit my original use-case of generating names)

Dependencies

~2–2.8MB
~55K SLoC