oscar-io

Readers/Writers for OSCAR Corpora

12 unstable releases (3 breaking)

0.4.0 Aug 8, 2023
0.3.1 Jul 25, 2023
0.3.0 May 24, 2023
0.2.4 May 15, 2023
0.1.3 May 13, 2022

23 downloads per month
Used in 2 crates

Apache-2.0

1MB
2K SLoC

oscar-io

Types and IO (Reader/Writer) for OSCAR Corpus processing and generation.

The crate provides basic abstractions around Corpus items and generic readers/writers useable in OSCAR Corpus files. At some time, it should replace reader implementations in both Ungoliant and oscar-tools.

Features

oscar-io aims to provide readers/writers for numerous types of OSCAR Corpora.

OSCAR v2

OSCAR v1.1

  • Reader
  • Writer
  • SplitReader (Should be unified with SplitReader with split_size: Option<u64>)
  • SplitWriter (Same)

OSCAR v1

  • Reader
  • Writer
  • SplitReader
  • SplitWriter

Dependencies

~12MB
~236K SLoC