#ngrams #simple #save-load #lm

n_gram

Simple library for training n-gram language models

10 releases

0.1.9 Feb 26, 2024
0.1.8 Feb 26, 2024

#195 in Machine learning

Download history 570/week @ 2024-02-21 319/week @ 2024-02-28 7/week @ 2024-03-06 5/week @ 2024-03-13 21/week @ 2024-03-27 40/week @ 2024-04-03

61 downloads per month

MIT license

16KB
261 lines

Info

Simple tool for training n-gram language model. Inspired by this course

Usage

use n_gram::*;

// Initializing model
let config = Config::default();
let mut model = Model::new(config);

// Loading and tokenizing corpus
let corpus = tiny_corpus()
      .iter()
      .map(|t| sos(eos(tokenize(t.to_owned()))))
      .collect::<Vec<_>>();

model.train(corpus);

// Now you are ready to generate something
let mut tokens = sos(tokenize("The quick".to_string()));
let max = 10; // max number of generated tokens
model.generate(&mut tokens, max);

// Save model
model.save("model.json").unwrap();

// Reset model
model.reset();

// Load model back
model.load("model.json").unwrap();

Installation

cargo add n_gram

If you want to save & load your models:

cargo add n_gram --features=saveload

If you want to load tiny corpus for training:

cargo add n_gram --features=corpus

Links

github
crates.io

Dependencies

~245–580KB
~11K SLoC