12 unstable releases (3 breaking)
0.4.0 | Aug 8, 2023 |
---|---|
0.3.1 | Jul 25, 2023 |
0.3.0 | May 24, 2023 |
0.2.4 | May 15, 2023 |
0.1.3 | May 13, 2022 |
23 downloads per month
Used in 2 crates
1MB
2K
SLoC
oscar-io
Types and IO (Reader/Writer) for OSCAR Corpus processing and generation.
The crate provides basic abstractions around Corpus items and generic readers/writers useable in OSCAR Corpus files. At some time, it should replace reader implementations in both Ungoliant and oscar-tools.
Features
oscar-io
aims to provide readers/writers for numerous types of OSCAR Corpora.
OSCAR v2
- Reader
- Uncompressed oscar_doc::Reader::new
- GZipped oscar_doc::Reader::from_gzip
- Parquet
- Writer
- Uncompressed oscar_doc::Writer::new
- GZipped oscar_doc::Writer::new (using a [GzEncoder] reader,
from_gzip
not yet implemented) - Parquet
- SplitReader (Should be unified with SplitReader with
split_size: Option<u64>
)- Uncompressed
- GZipped
- SplitWriter (Same)
- Uncompressed
- GZipped
OSCAR v1.1
- Reader
- Writer
- SplitReader (Should be unified with SplitReader with
split_size: Option<u64>
) - SplitWriter (Same)
OSCAR v1
- Reader
- Writer
- SplitReader
- SplitWriter
Dependencies
~12MB
~236K SLoC