10 releases
0.1.9 | Feb 26, 2024 |
---|---|
0.1.8 | Feb 26, 2024 |
#195 in Machine learning
61 downloads per month
16KB
261 lines
Info
Simple tool for training n-gram language model. Inspired by this course
Usage
use n_gram::*;
// Initializing model
let config = Config::default();
let mut model = Model::new(config);
// Loading and tokenizing corpus
let corpus = tiny_corpus()
.iter()
.map(|t| sos(eos(tokenize(t.to_owned()))))
.collect::<Vec<_>>();
model.train(corpus);
// Now you are ready to generate something
let mut tokens = sos(tokenize("The quick".to_string()));
let max = 10; // max number of generated tokens
model.generate(&mut tokens, max);
// Save model
model.save("model.json").unwrap();
// Reset model
model.reset();
// Load model back
model.load("model.json").unwrap();
Installation
cargo add n_gram
If you want to save & load your models:
cargo add n_gram --features=saveload
If you want to load tiny corpus for training:
cargo add n_gram --features=corpus
Links
Dependencies
~245–580KB
~11K SLoC