1 stable release

1.0.0 Nov 18, 2024

#2515 in Parser implementations


Used in wiki_corpus

MIT license

76KB
1.5K SLoC


Extract text from Wikipedia dumps (.bz2) and convert it to JSONLines format

Dependencies

~11–21MB
~283K SLoC