8 unstable releases (3 breaking)

✓ Uses Rust 2018 edition

new 0.8.2 Aug 16, 2019
0.8.1 Aug 12, 2019
0.7.1 Jun 21, 2019
0.6.0 Jun 6, 2019
0.5.1 Mar 24, 2019

#16 in Machine learning

Download history 88/week @ 2019-05-03 276/week @ 2019-05-10 342/week @ 2019-05-17 93/week @ 2019-05-24 200/week @ 2019-05-31 321/week @ 2019-06-07 332/week @ 2019-06-14 286/week @ 2019-06-21 76/week @ 2019-06-28 39/week @ 2019-07-05 190/week @ 2019-07-12 260/week @ 2019-07-19 105/week @ 2019-07-26 119/week @ 2019-08-02 67/week @ 2019-08-09

762 downloads per month
Used in 5 crates (4 directly)

Custom license

195KB
4K SLoC

Introduction

crates.io docs.rs Travis CI

This is a crate for reading, writing, and using finalfusion embeddings in Rust. Additionally, the word2vec and GloVe file formats are also supported. Please consult the API documentation for usage information.

Note: This package is still new, its API will change.


lib.rs:

A library for reading, writing, and using word embeddings.

finalfusion allows you to read, write, and use word2vec/GloVe embeddings and read fastText embeddings. finalfusion uses finalfusion as its native data format, which has several benefits over the word2vec, GloVe, and fastText formats.

Reading finalfusion embeddings

finalfusion embeddings can be read with the read_embeddings method, which expects a reader that implements the BufRead trait.

Since finalfusion supports various types of vocabularies and embedding matrix (storage) formats, these should be specified as type parameters of the Embeddings type. However, typically one would want to read finalfusion embeddings with any type of vocabulary or embedding matrix. For this purpose, the VocabWrap and StorageWrap types are provided, which wrap any type of vocabulary and embeddung matrix.

We can thus load a finalfusion format and retrieve an embedding as follows:

use std::fs::File;
use std::io::BufReader;

use finalfusion::prelude::*;

let mut reader = BufReader::new(File::open("testdata/similarity.fifu").unwrap());

// Read the embeddings.
let embeddings: Embeddings<VocabWrap, StorageWrap> =
    Embeddings::read_embeddings(&mut reader)
    .unwrap();

// Look up an embedding.
let embedding = embeddings.embedding("Berlin");

For performing analogy/similarity queries on the embedding matrix, we need an embedding matrix which can act as a view. In that case one should use StorageViewWrap in place of StorageWrap. StorageViewWrap is only supported for a subset of embedding matrix types -- in particular, quantized matrices cannot be used as a view.

Reading other embedding formats

Consult the documentation of the fasttext, text and word2vec modules for information on how to read fastText, GloVe, and word2vec embeddings.

Dependencies

~4.5MB
~91K SLoC