7 unstable releases (3 breaking)

0.4.1 Aug 15, 2021
0.4.0 Jul 24, 2021
0.3.1 Mar 26, 2021
0.2.1 Feb 5, 2021
0.1.0 Oct 23, 2020
Download history 3/week @ 2021-07-04 6/week @ 2021-07-11 18/week @ 2021-07-18 17/week @ 2021-07-25 17/week @ 2021-08-01 3/week @ 2021-08-08 31/week @ 2021-08-15 4/week @ 2021-08-22 7/week @ 2021-08-29 3/week @ 2021-09-05 5/week @ 2021-09-12 3/week @ 2021-09-19 9/week @ 2021-10-03 10/week @ 2021-10-10 1/week @ 2021-10-17

61 downloads per month
Used in syntaxdot-cli

MIT/Apache

480KB
10K SLoC

SyntaxDot

Introduction

SyntaxDot is a sequence labeler and dependency parser using Transformer networks. SyntaxDot models can be trained from scratch or using pretrained models, such as BERT or XLM-RoBERTa.

In principle, SyntaxDot can be used to perform any sequence labeling task, but so far the focus has been on:

  • Part-of-speech tagging
  • Morphological tagging
  • Topological field tagging
  • Lemmatization
  • Named entity recognition

The easiest way to get started with SyntaxDot is to use a pretrained sticker2 model (SyntaxDot is currently compatbile with sticker2 models).

Features

  • Input representations:
    • Word pieces
    • Sentence pieces
  • Flexible sequence encoder/decoder architecture, which supports:
    • Simple sequence labels (e.g. POS, morphology, named entities)
    • Lemmatization, based on edit trees
    • Simple API to extend to other tasks
    • Dependency parsing as sequence labeling
  • Dependency parsing using deep biaffine attention and MST decoding.
  • Multi-task training and classification using scalar weighting.
  • Encoder models:
    • Transformers
    • Finetuning of BERT, XLM-RoBERTa, ALBERT, and SqueezeBERT models
  • Model distillation
  • Deployment:
    • Standalone binary that links against PyTorch's libtorch
    • Very liberal license

Documentation

References

SyntaxDot uses techniques from or was inspired by the following papers:

Issues

You can report bugs and feature requests in the SyntaxDot issue tracker.

License

For licensing information, see COPYRIGHT.md.

Dependencies

~15MB
~295K SLoC