12 releases (5 breaking)

0.6.4 Dec 2, 2021
0.6.3 Dec 2, 2021
0.5.0 Nov 30, 2021
0.4.1 Nov 30, 2021
0.1.0 Nov 27, 2021

#1086 in Text processing

MIT/Apache

38KB
817 lines

GenEx

Crates.io MIT licensed Documentation Coverage

GenEx is a text template expansion library.


lib.rs:

Rust library implementing a custom text generation/templating system. Genex is similar to Tracery, but with some extra functionality around using external data.

Usage

First create a grammar, then generate an expansion or multiple expansions from it.

use std::collections::HashSet;
use std::str::FromStr;
use maplit::hashmap;
use genex::Grammar;

let grammar = Grammar::from_str(
    r#"
      RULES:
      top = The <adj> <noun> #action|ed# #object|a#?:[ with gusto] in <place>.
      adj = [glistening|#adj#]
      noun = key
      place = [the #room#|#city#]

      WEIGHTS:
      room = 2
      city = 1
    "#,
)
.unwrap();

let data = hashmap! {
    "action".to_string() => "pick".to_string(),
    "object".to_string() => "lizard".to_string(),
    "room".to_string() => "kitchen".to_string(),
    "city".to_string() => "New York".to_string(),
};

// Now we find the top-scoring expansion. The score is the sum of the
// weights of all variables used in an expansion. We know that the top
// scoring expansion is going to end with "the kitchen" because we gave
// `room` a higher weight than `city`.

let best_expansion = grammar.generate("top", &data).unwrap().unwrap();

assert_eq!(
    best_expansion,
    "The glistening key picked a lizard in the kitchen.".to_string()
);

// Now get all possible expansions:

let all_expansions = grammar.generate_all("top", &data).unwrap();

assert_eq!(
    HashSet::<_>::from_iter(all_expansions),
    HashSet::<_>::from_iter(vec![
        "The glistening key picked a lizard in New York.".to_string(),
        "The glistening key picked a lizard with gusto in New York.".to_string(),
        "The glistening key picked a lizard with gusto in the kitchen.".to_string(),
        "The glistening key picked a lizard in the kitchen.".to_string(),
    ])
);

Features

Genex tries to make it easy to generate text based on varying amounts of external data. For example you can write a single expansion grammar that works when all you know is the name of an object, but uses the additional information if you know the object's size, location, color, or other qualities.

The default behavior is for genex to try to find an expansion that uses the most external data possible, but by changing the weights assigned to variables you can prioritize which variables are used, even prioritizing the use of a single important variable over the use of multiple, less important variables.

Grammar syntax

Rules

"RULES:" indicates the rules section of the grammar. Rules are defined by a left-hand side (LHS) and a right-hand side (RHS). The LHS is the name of the rule. The RHS is a sequence of terms.

Terms:

  • Sequence: [term1 term2 ...]
  • Choice: [term1|term2|...] (You can put a newline after a | character.)
  • Optional: ?:[term1 term2 ...]
  • Variable: #variable# or #variable|modifier#
  • Non-terminal: <rule-name>
  • Plain text: I am some plain text. I hope I get expanded.

Weights

"WEIGHTS:" indicates the weights section of the grammar. Weights are of the form <rule-name> = <number>.

Modifiers

Modifiers are used to transform variable values during expansion.

Modifiers:

  • capitalize: Capitalizes the first letter of the value.
  • capitalizeAll: Capitalizes the first letter of each word in the value.
  • inQuotes: Surrounds the value with double quotes.
  • comma: Adds a comma after the value, if it doesn't already end with punctuation.
  • s: Pluralizes the value.
  • a: Prefixes the value with an "a"/"an" article as appropriate.
  • ed: Changes the first word of the value to be past tense.

Dependencies

~5.5–7.5MB
~141K SLoC