#regex #logic #polars #temporal-logic #formal-methods #offline-monitor

temex

Regex-like temporal expressions for evaluating systems that change over time

1 unstable release

0.10.0 Jun 27, 2023

#1237 in Algorithms

29 downloads per month

MIT/Apache

60KB
859 lines

temex

temex - Regex-like temporal expressions for evaluating systems that change over time

This library implements temexes, which are structures that combine formulas of propositional logic with regular expression operators to provide a temporal logic designed for simplicity and ease of use.

Table of contents

  1. Introduction
  2. Overview
  3. Temex syntax
  4. Supported trace formats
  5. Usage
  6. Examples
  7. License

Introduction

Temporal logic has useful applications in a wide range of contexts, in and outside of computer science. A detailed examination is beyond the scope of this document, but a good introduction is available here. Although temporal logic has many use cases, most tools that implement some form of temporal logic are designed for formal methods specialists and logicians, not for a general audience. As a result, they can seem arcane and unapproachable to a layman.

The purpose of temex is to provide a simple tool that allows generalist software engineers and data analysts to harness the power of temporal logic. It is designed to be easy to understand and use, requiring only a basic knowledge of propositional logic and regular expressions.

Overview

A truth assignment is a mapping of boolean variables to truth values (i.e., true or false). When all the boolean variables that occur in a formula of propositional logic are assigned a value, then that formula evaluates to either true or false. We can think of a formula as defining a set of truth assignments: namely, the truth assignments that make that formula evaluate to true.

A regular expression is either a character, or it is a composite expression in which subexpressions are joined by regex operators. A regular expression defines a set of sequences of characters: those sequences that match the regular expression.

We can think of truth assignments as characters, and we can think of a propositional formula as a regular expression in which the truth assignments it defines are joined by alternation (i.e., the regex operator |). So, for instance, the formula not (p and q) can be thought of as {p = true, q = false} | {p = false, q = true} | {p = false, q = false}.

If we consider truth assignments as characters and propositional formulas as alternations over these characters, then if we join propositional formulas with regex operators and interpret the result according to the ordinary semantics of regular expressions we have something that defines a sequence of truth assignments. We call this a temex, for temporal expression.

We can use a temex to search sequences of truth assignments (henceforth called traces) in much the same way we can use regular expressions to search strings. temex implements this functionality.

Temex syntax

Temex atoms are propositional formulas enclosed in square brackets ([ and ]). They may be joined with regex operators in the same ways characters can be joined with regex operators in a regular expression.

Propositional formula syntax

A propositional formula may be a boolean variable (which begins with an alphabetic character and may contain alphabetic or numeric characters as well as underscores), or a composite formula, in which subformulas are joined with propositional connectives. temex uses the English words not, and, and or for negation, conjunction, and disjunction; not has the highest operator precedence, while and and or have equal precedence and are right associative. Examples:

  • f123
  • is_sunny and not is_raining
  • cpu_GT_80 and not (mem_LT_40 or io_LT_20)

Regex operator syntax

temex's regex operators inherit the syntax of the regex crate, which temex uses to implement its regex operators.

In the following forms, phi, phi_1, and phi_2 are propositional formulas, and n and m are natural numbers:

[phi_1][phi_2]      concatenation (phi_1 followed by phi_2)
[phi_1]|[phi_2]     alternation (phi_1 or phi_2, prefer phi_1)
[phi]*              zero or more of phi (greedy)
[phi]+              one or more of phi (greedy)
[phi]?              zero or one of phi (greedy)
[phi]*?             zero or more of phi (ungreedy/lazy)
[phi]+?             one or more of phi (ungreedy/lazy)
[phi]??             zero or one of phi (ungreedy/lazy)
[phi]{n,m}          at least n phi and at most m phi (greedy)
[phi]{n,}           at least n phi (greedy)
[phi]{n}            exactly n phi
[phi]{n,m}?         at least n phi and at most m phi (ungreedy/lazy)
[phi]{n,}?          at least n phi (ungreedy/lazy)
[phi]{n}?           exactly n phi

Supported trace formats

The following trace formats are supported:

  • polars dataframes in which all of the columns are boolean. Column names are interpreted as the names of boolean variables and the rows of the dataframe are interpreted as truth assignments.

  • CSVs encoded as UTF8 strings or files. The first line should contain the names of boolean variables and the subsequent lines should contain values of either 1 or 0, for true or false, respectively. The lines after the header are interpreted as truth assignments.

When matching against a trace, the variable names in the temex pattern and the trace must be the same; if there are extra variables in the trace then searching will result in an error.

Usage

Prior to performing any search operation, the temex pattern and trace must be compiled. Compiling the temex pattern is done with Temex::new, while compiling the trace is done with TemexTrace::try_from. Searches may be performed using methods on the temex; a complete listing is available here.

Examples

Example 1: Testing for a match in a polars DataFrame using is_match

This example shows how to perform a temex search in a polars DataFrame. The DataFrame must be converted to a TemexTrace before searching. The example also shows how to convert a TemexTrace back to a DataFrame. Note the use of ^ and $, which are anchors that match to the beginning and end of the trace, respectively:

use polars::df;
use polars::prelude::*;
use temex::{Temex, TemexTrace};
let df = df! [
    "p1" => [true, true, true],
    "p2" => [false, false, false],
    "p3" => [true, false, true]
]
.unwrap();
let trace = TemexTrace::try_from(df.clone()).unwrap();
let te = Temex::new("^[p1 and (p2 or p3)][p1][p1 or p2 and not p3]$").unwrap();

assert!(te.is_match(&trace).unwrap());

// We can convert the TemexTrace back to a DataFrame
let df_from_trace = polars::frame::DataFrame::try_from(trace).unwrap();
assert_eq!(df, df_from_trace);
Example 2: Finding the location of a match using find

Here we use the find method to find the range of the left-most first match in the trace:

let trace_str = "CPU_core1_GT_80, CPU_core2_GT_80, mem_usage_GT_40\n\
                       0,               0,               0\n\
                       1,               0,               0\n\
                       0,               1,               0\n\
                       0,               0,               1\n\
                       1,               0,               1\n\
                       0,               1,               1\n\
                       1,               1,               0\n\
                       1,               1,               0\n\
                       1,               1,               0\n";

let trace = trace_str.parse::<TemexTrace>().unwrap();
let te = Temex::new("[CPU_core1_GT_80 and CPU_core2_GT_80 and not mem_usage_GT_40]+").unwrap();

assert_eq!(te.find(&trace).unwrap().unwrap().range(), 6..9);
Example 3: Find all nonoverlapping matches using find_iter

In this example we see how the find_iter method can be used to find multiple nonoverlapping matches within a trace:

let trace_str = "is_sunny, is_windy\n\
                        1,        0\n\
                        1,        0\n\
                        1,        0\n\
                        0,        0\n\
                        1,        0\n\
                        1,        0\n\
                        1,        0\n\
                        1,        1\n\
                        1,        0\n\
                        1,        0\n\
                        1,        0\n";

let trace = TemexTrace::try_from(trace_str).unwrap();
let te = Temex::new("[is_sunny and not is_windy]{3}").unwrap();

for te_match in te.find_iter(&trace) {
    println!("{:?}", te_match.range());
}

// Output:
// 0..3
// 4..7
// 8..11

License

Copyright (c) 2023 Kevin Wayne Smith

This project is licensed under either of

at your option.

Dependencies

~20MB
~381K SLoC