1 stable release

1.0.0 Nov 18, 2024

#758 in Text processing

Download history 143/week @ 2024-11-18 7/week @ 2024-11-25 21/week @ 2024-12-02 14/week @ 2024-12-09

185 downloads per month
Used in 2 crates (via wiki_corpus_parser)

MIT license

65KB
1.5K SLoC


Extract text from Wikipedia dumps (.bz2) and convert it to JSONLines format

Dependencies

~7–16MB
~193K SLoC