5 releases (3 breaking)

0.4.0 Jul 24, 2021
0.3.0 Mar 22, 2021
0.2.1 Feb 5, 2021
0.2.0 Nov 19, 2020
0.1.0 Oct 23, 2020

50 downloads per month
Used in 2 crates


3.5K SLoC



SyntaxDot is a sequence labeler and dependency parser using Transformer networks. SyntaxDot models can be trained from scratch or using pretrained models, such as BERT or XLM-RoBERTa.

In principle, SyntaxDot can be used to perform any sequence labeling task, but so far the focus has been on:

  • Part-of-speech tagging
  • Morphological tagging
  • Topological field tagging
  • Lemmatization
  • Named entity recognition

The easiest way to get started with SyntaxDot is to use a pretrained sticker2 model (SyntaxDot is currently compatbile with sticker2 models).


  • Input representations:
    • Word pieces
    • Sentence pieces
  • Flexible sequence encoder/decoder architecture, which supports:
    • Simple sequence labels (e.g. POS, morphology, named entities)
    • Lemmatization, based on edit trees
    • Simple API to extend to other tasks
    • Dependency parsing as sequence labeling
  • Dependency parsing using deep biaffine attention and MST decoding.
  • Multi-task training and classification using scalar weighting.
  • Encoder models:
    • Transformers
    • Finetuning of BERT, XLM-RoBERTa, ALBERT, and SqueezeBERT models
  • Model distillation
  • Deployment:
    • Standalone binary that links against PyTorch's libtorch
    • Very liberal license



SyntaxDot uses techniques from or was inspired by the following papers:


You can report bugs and feature requests in the SyntaxDot issue tracker.


For licensing information, see COPYRIGHT.md.


~173K SLoC