#conllu #nlp #treebank

bin+lib conll

Parser for CoNLL(-U) Treebanks

3 unstable releases

0.2.0 Aug 26, 2023
0.1.1 Aug 26, 2023
0.1.0 Aug 26, 2023

#227 in #nlp

MIT license

13KB
300 lines

conll

conll is a Rust Crate for efficiently parsing Treebanks in the CoNLL(-U) format.

Usage

You can use the parse program bundled with the crate, or you can use the library programmatically with the following usage:

let lines: Vec<String>;

let treebank = conll::conllu::parser::parse(lines).unwrap();

Performance

The ConLL-U parser is quite fast. Here is the output of executing the binary using time, on a 14MB file.

$ time ./target/release/parse nl_alpino-ud-dev.conllu -s

real    0m0.074s
user    0m0.054s
sys     0m0.019s

For comparison, here it is on a 195MB file.

time ./target/release/parse de_hdt-ud-train.conllu -s

real    0m5.006s
user    0m3.866s
sys     0m1.116s

Dependencies

~2.7–4.5MB
~78K SLoC