#context-free-grammar #learning #bayesian #genetic #inference #language

programinduction

A library for program induction and learning representations

37 releases

0.9.0 Dec 18, 2023
0.8.0 May 31, 2019
0.7.4 Aug 22, 2018
0.6.12 Jun 18, 2018
0.3.3 Mar 29, 2018

#375 in Algorithms

Download history 50/week @ 2024-09-23 136/week @ 2024-12-09

136 downloads per month

MIT license

355KB
7K SLoC

program-induction

crates.io docs.rs CI

A library for program induction and learning representations.

Implements Bayesian program learning and genetic programming. See the docs for more information.

Installation

Install rust and ensure you're up to date (rustup update). In a new or existing project, add the following to your Cargo.toml:

[dependencies]
programinduction = "0.9"
# many examples also depend on polytype for the tp! and ptp! macros:
polytype = "7.0"

The documentation requires a custom HTML header to include KaTeX for math support. This isn't supported by cargo doc, so to build the documentation you may use:

cargo rustdoc -- --html-in-header rustdoc-include-katex-header.html

Usage

Specify a probabilistic context-free grammar (PCFG; see pcfg::Grammar) and induce a sentence that matches an example:

use polytype::tp;
use programinduction::pcfg::{task_by_evaluation, Grammar, Rule};
use programinduction::{ECParams, EC};

fn evaluate(name: &str, inps: &[i32]) -> Result<i32, ()> {
    match name {
        "0" => Ok(0),
        "1" => Ok(1),
        "plus" => Ok(inps[0] + inps[1]),
        _ => unreachable!(),
    }
}

fn main() {
    let g = Grammar::new(
        tp!(EXPR),
        vec![
            Rule::new("0", tp!(EXPR), 1.0),
            Rule::new("1", tp!(EXPR), 1.0),
            Rule::new("plus", tp!(@arrow[tp!(EXPR), tp!(EXPR), tp!(EXPR)]), 1.0),
        ],
    );
    let ec_params = ECParams {
        frontier_limit: 1,
        search_limit_timeout: None,
        search_limit_description_length: Some(8.0),
    };
    // task: the number 4
    let task = task_by_evaluation(&evaluate, &4, tp!(EXPR));

    let frontiers = g.explore(&ec_params, &[task]);
    let sol = &frontiers[0].best_solution().unwrap().0;
    println!("{}", g.display(sol));
}

The Exploration-Compression (EC) algorithm iteratively learns a better representation by finding common structure in induced programs. We can run the EC algorithm with a polymorphically-typed lambda calculus representation lambda::Language in a Boolean circuit domain:

use polytype::{ptp, tp};
use programinduction::{domains, lambda, ECParams, EC};

fn main() {
    // circuit DSL
    let dsl = lambda::Language::uniform(vec![
        // NAND takes two bools and returns a bool
        ("nand", ptp!(@arrow[tp!(bool), tp!(bool), tp!(bool)])),
    ]);
    // parameters
    let lambda_params = lambda::CompressionParams::default();
    let ec_params = ECParams {
        frontier_limit: 1,
        search_limit_timeout: Some(std::time::Duration::new(1, 0)),
        search_limit_description_length: None,
    };
    // randomly sample 250 circuit tasks
    let rng = &mut rand::thread_rng();
    let tasks = domains::circuits::make_tasks(rng, 250);

    // one iteration of EC:
    let (new_dsl, _solutions) = dsl.ec(&ec_params, &lambda_params, &tasks);
    // print the new concepts it invented, based on common structure:
    for (expr, _, _) in &new_dsl.invented {
        println!("invented {}", new_dsl.display(expr))
        // one of the inventions was "(λ (nand $0 $0))",
        // which is the common and useful NOT operation!
    }
}

You may have noted the above use of domains::circuits. Some domains are already implemented for you. Currently, this only consists of circuits and strings.

TODO

(you could be the one who does one of these!)

  • First-class function evaluation within Rust (and remove lisp interpreters).
  • Add task generation function in domains::strings
  • Fallible evaluation (e.g. see how domains::strings handles slice).
  • Lazy evaluation.
  • impl GP for pcfg::Grammar is not yet complete.
  • Eta-long sidestepping (so f gets enumerated instead of (λ (f $0)))
  • Consolidate lazy/non-lazy evaluation (for ergonomics).
  • Permit non-&'static str-named Type/TypeScheme.
  • Ability to include recursive primitives in lambda representation.
  • Faster lambda calculus evaluation (less cloning; bubble up whether beta reduction happened rather than ultimate equality comparison).
  • PCFG compression is currently only estimating parameters, not actually learning pieces of programs. An adaptor grammar approach seems like a good direction to go, perhaps minus the Bayesian non-parametrics.
  • Add more learning traits (like EC or GP)
  • Add more representations
  • Add more domains

Dependencies

~5–11MB
~126K SLoC