3 unstable releases
0.2.0 | Aug 26, 2023 |
---|---|
0.1.1 | Aug 26, 2023 |
0.1.0 | Aug 26, 2023 |
#227 in #nlp
13KB
300 lines
conll
conll is a Rust Crate for efficiently parsing Treebanks in the CoNLL(-U) format.
Usage
You can use the parse
program bundled with the crate, or you can use the
library programmatically with the following usage:
let lines: Vec<String>;
let treebank = conll::conllu::parser::parse(lines).unwrap();
Performance
The ConLL-U parser is quite fast. Here is the output of executing the binary
using time
, on a 14MB file.
$ time ./target/release/parse nl_alpino-ud-dev.conllu -s
real 0m0.074s
user 0m0.054s
sys 0m0.019s
For comparison, here it is on a 195MB file.
time ./target/release/parse de_hdt-ud-train.conllu -s
real 0m5.006s
user 0m3.866s
sys 0m1.116s
Dependencies
~2.7–4.5MB
~78K SLoC