22 releases (stable)

1.4.1 Dec 19, 2023
1.4.0 Nov 17, 2023
1.3.3 Oct 27, 2023
1.2.5 Apr 25, 2023
0.4.1 Nov 10, 2021

#207 in Internationalization (i18n)

Download history 225/week @ 2023-12-23 247/week @ 2023-12-30 259/week @ 2024-01-06 242/week @ 2024-01-13 238/week @ 2024-01-20 303/week @ 2024-01-27 309/week @ 2024-02-03 232/week @ 2024-02-10 268/week @ 2024-02-17 437/week @ 2024-02-24 434/week @ 2024-03-02 521/week @ 2024-03-09 438/week @ 2024-03-16 444/week @ 2024-03-23 551/week @ 2024-03-30 402/week @ 2024-04-06

1,912 downloads per month

Custom license

12MB
92K SLoC

icu_datagen crates.io

icu_datagen is a library to generate data files that can be used in ICU4X data providers.

Data files can be generated either programmatically (i.e. in build.rs), or through a command-line utility.

Also see our datagen tutorial.

Examples

Rust API

use icu_datagen::blob_exporter::*;
use icu_datagen::prelude::*;
use std::fs::File;

DatagenDriver::new()
    .with_keys([icu::list::provider::AndListV1Marker::KEY])
    .with_all_locales()
    .export(
        &DatagenProvider::new_latest_tested(),
        BlobExporter::new_v2_with_sink(Box::new(
            File::create("data.postcard").unwrap(),
        )),
    )
    .unwrap();

Command line

The command line interface can be installed through Cargo.

$ cargo install icu_datagen

Once the tool is installed, you can invoke it like this:

$ icu4x-datagen --keys all --locales de en-AU --format blob --out data.postcard

For complex invocations, the CLI also supports configuration files:

$ icu4x-datagen config.json
config.json
{
  "keys": {
    "explicit": [
      "core/helloworld@1",
      "fallback/likelysubtags@1",
      "fallback/parents@1",
      "fallback/supplement/co@1"
    ]
  },
  "fallback": "runtimeManual",
  "locales": "all",
  "segmenterModels": ["burmesedict"],
  "additionalCollations": ["big5han"],
"cldr": "latest", "icuExport": "73.1", "segmenterLstm": "none",
"export": { "blob": { "path": "blob.postcard" } }, "overwrite": true }

More details can be found by running --help.

Cargo features

This crate has a lot of dependencies, some of which are not required for all operating modes. These default Cargo features can be disabled to reduce dependencies:

  • baked_exporter
    • enables the baked_exporter module
    • enables the --format mod CLI argument
  • blob_exporter
  • fs_exporter
  • networking
    • enables methods on DatagenProvider that fetch source data from the network
    • enables the --cldr-tag, --icu-export-tag, and --segmenter-lstm-tag CLI arguments that download data
  • rayon
    • enables parallelism during export
  • use_wasm / use_icu4c
  • bin
    • required by the CLI and enabled by default to make cargo install work
  • legacy_api
    • enables the deprecated pre-1.3 API
    • enabled by default for semver stability
    • will be removed in 2.0.

Experimental unstable ICU4X components are behind Cargo features which are not enabled by default. Note that these Cargo features affect the behaviour of all_keys:

  • icu_compactdecimal
  • icu_displaynames
  • icu_relativetime
  • icu_transliterate
  • ...

The meta-feature experimental_components is available to activate all experimental components.

More Information

For more information on development, authorship, contributing etc. please visit ICU4X home page.

Dependencies

~9–22MB
~268K SLoC