#lace #specification #data #default #utilities #yaml #codebook

lace_codebook

Contains the Lace codebook specification as well as utilities for generating defaults

11 releases (4 breaking)

0.6.0 Feb 7, 2024
0.3.0 Nov 21, 2023
0.1.4 Jul 26, 2023

#781 in Encoding

Download history 23/week @ 2024-01-22 17/week @ 2024-02-05 6/week @ 2024-02-19 37/week @ 2024-02-26 3/week @ 2024-03-04 36/week @ 2024-03-11 3/week @ 2024-03-18 31/week @ 2024-04-01

71 downloads per month
Used in 4 crates (3 directly)

BUSL-1.1

305KB
8K SLoC

Lace codebook

Contains the lace codebook specification as well as utilities for generating defaults.

If you design a new type, implement FromStr in lace_utils, and decide its precident for the codebook in this crate.


lib.rs:

The Codebook is a YAML file used to associate metadata with the dataset. The user can set the priors on the structure of each state, can identify the model for each columns, and set hyper priors.

Often the data has too many columns to write a codebook manually, so there are functions to guess at a default codebook given a dataset. The user can then edit the default file.

Example

An Example codebook for a two-column dataset.

use indoc::indoc;

let codebook_str = indoc!("
    ---
    table_name: two column dataset
    state_alpha_prior:
      !Gamma
        shape: 1.0
        rate: 1.0
    view_alpha_prior:
      !Gamma
        shape: 1.0
        rate: 1.0
    col_metadata:
      - name: col_1
        notes: first column with all fields filled in
        coltype:
          !Categorical
            k: 3
            hyper:
              pr_alpha:
                shape: 1.0
                scale: 1.0
            prior:
                k: 3
                alpha: 0.5
            value_map: !string
              0: red
              1: green
              2: blue
      - name: col_2
        notes: A binary column with optional fields left out
        coltype:
          !Categorical
            k: 2
            value_map: !u8 2
    comments: An example codebook
    row_names:
      - A
      - B
      - C
      - D
      - E");

let codebook: Codebook = serde_yaml::from_str(&codebook_str).unwrap();

assert_eq!(codebook.col_metadata.len(), 2);

Dependencies

~30–65MB
~1M SLoC