11 releases
0.1.12 | Jun 10, 2024 |
---|---|
0.1.11 |
|
0.1.9 | Feb 26, 2024 |
#171 in Machine learning
19KB
262 lines
Info
Simple tool for training n-gram language model. Inspired by this course.
Usage
use n_gram::*;
fn main() {
// Initializing model
let config = Config::default();
let mut model = Model::new(config);
// Loading and tokenizing corpus
let corpus = tiny_corpus()
.iter()
.map(|t| sos(eos(tokenize(t.to_owned()))))
.collect::<Vec<_>>();
model.train(corpus);
// Now you are ready to generate something
let mut tokens = sos(tokenize("The quick".to_owned()));
let max = 10; // max number of generated tokens
model.generate(&mut tokens, max);
// Save model
model.save("model.json").unwrap();
// Reset model
model.reset();
// Load model back
model.load("model.json").unwrap();
}
Examples
I've trained a trigram model on 20000 samples from the Tiny Stories dataset. Here are some examples of generated text:
- "__sos__ Once upon a time a mom, a dad, a big sister, and a little girl below shouted, "Look Mama! A talking cloud!" The little girl opened her hand, and the monkey happily ate it all in one day. She was so kind he said yes and showed him the pin. "I poked you with this. It is a storm. The waves were so tall and wide, it seemed like something was calling her to come to an end eventually. They all had an incredible songbird inside. Billy was happy and excited. __eos__"
- "__sos__ Once upon a time there was a light girl with a basket. She then sent the basket to the washing machine. While the laundry was all hung up, Daisy and her family were getting ready to fly it, it suddenly flew away! The lion felt bad for being rude. He said, "It's my p leasure. It's important to remember to forgive. __eos__"
- "__sos__ Once upon a time a family lived in a stream with many stones on the ground, it glistened in the sunshine. From that day forth they were always with her and learn with her and waved goodbye to Mommy. The bus driver was happy and flew away happily. Timmy felt proud of their pictures. __eos__"
Installation
cargo add n_gram
If you want to save & load your models:
cargo add n_gram --features=saveload
If you want to load tiny corpus for training:
cargo add n_gram --features=corpus
Links
Dependencies
~0.6–1.4MB
~30K SLoC