#utils #word #embeddings #train #subword

bin+lib finalfrontier-utils

Train and use word embeddings with subword representations

9 releases (5 breaking)

0.6.2 Nov 8, 2019
0.6.1 Jun 21, 2019
0.5.0 Apr 25, 2019
0.4.1 Apr 12, 2019
0.1.0 Sep 10, 2018

39 downloads per month

Apache-2.0

210KB
5.5K SLoC

Crate Docs Build Status

finalfrontier

Introduction

finalfrontier is a Rust program for training word embeddings. finalfrontier currently has the following features:

  • Models:
    • skip-gram (Mikolov et al., 2013)
    • structured skip-gram (Ling et al., 2015)
    • directional skip-gram (Song et al., 2018)
    • dependency (Levy and Goldberg, 2014)
  • Output formats:
    • finalfusion
    • fastText
    • word2vec binary
    • word2vec text
    • GloVe text
  • Noise contrastive estimation (Gutmann and Hyvärinen, 2012)
  • Subword representations (Bojanowski et al., 2016)
  • Hogwild SGD (Recht et al., 2011)
  • Quantized embeddings through the finalfusion quantize command.

The trained embeddings can be stored in the versatile finalfusion format, which can be read and used with the finalfusion crate and the finalfusion Python module.

The minimum required Rust version is currently 1.40.

Where to go from here

Dependencies

~9MB
~193K SLoC