#ngrams #simple #lm

n_gram

Simple library for training n-gram language models

11 releases

0.1.12 Jun 10, 2024
0.1.11 May 25, 2024
0.1.9 Feb 26, 2024

#171 in Machine learning

MIT license

19KB
262 lines

Info

Simple tool for training n-gram language model. Inspired by this course.

Usage

use n_gram::*;

fn main() {
    // Initializing model
    let config = Config::default();
    let mut model = Model::new(config);

    // Loading and tokenizing corpus
    let corpus = tiny_corpus()
          .iter()
          .map(|t| sos(eos(tokenize(t.to_owned()))))
          .collect::<Vec<_>>();

    model.train(corpus);

    // Now you are ready to generate something
    let mut tokens = sos(tokenize("The quick".to_owned()));
    let max = 10; // max number of generated tokens
    model.generate(&mut tokens, max);

    // Save model
    model.save("model.json").unwrap();

    // Reset model
    model.reset();

    // Load model back
    model.load("model.json").unwrap();
}

Examples

I've trained a trigram model on 20000 samples from the Tiny Stories dataset. Here are some examples of generated text:

  • "__sos__ Once upon a time a mom, a dad, a big sister, and a little girl below shouted, "Look Mama! A talking cloud!" The little girl opened her hand, and the monkey happily ate it all in one day. She was so kind he said yes and showed him the pin. "I poked you with this. It is a storm. The waves were so tall and wide, it seemed like something was calling her to come to an end eventually. They all had an incredible songbird inside. Billy was happy and excited. __eos__"
  • "__sos__ Once upon a time there was a light girl with a basket. She then sent the basket to the washing machine. While the laundry was all hung up, Daisy and her family were getting ready to fly it, it suddenly flew away! The lion felt bad for being rude. He said, "It's my p leasure. It's important to remember to forgive. __eos__"
  • "__sos__ Once upon a time a family lived in a stream with many stones on the ground, it glistened in the sunshine. From that day forth they were always with her and learn with her and waved goodbye to Mommy. The bus driver was happy and flew away happily. Timmy felt proud of their pictures. __eos__"

Installation

cargo add n_gram

If you want to save & load your models:

cargo add n_gram --features=saveload

If you want to load tiny corpus for training:

cargo add n_gram --features=corpus

Links

Dependencies

~0.6–1.4MB
~30K SLoC