#directive #text-file #output #regex #variables #pattern

filecheck

Library for writing tests for utilities that read text files and produce text output

7 releases (breaking)

0.5.0 Mar 17, 2020
0.4.0 Sep 28, 2018
0.3.0 Mar 20, 2018
0.2.1 Mar 15, 2018
0.0.1 Oct 18, 2016

#5 in #directive

Download history 13118/week @ 2024-06-13 12026/week @ 2024-06-20 12633/week @ 2024-06-27 9124/week @ 2024-07-04 11449/week @ 2024-07-11 13394/week @ 2024-07-18 15309/week @ 2024-07-25 14564/week @ 2024-08-01 15904/week @ 2024-08-08 14935/week @ 2024-08-15 12516/week @ 2024-08-22 11982/week @ 2024-08-29 14889/week @ 2024-09-05 16492/week @ 2024-09-12 15705/week @ 2024-09-19 9321/week @ 2024-09-26

58,122 downloads per month
Used in 21 crates (4 directly)

Apache-2.0 WITH LLVM-exception

58KB
1K SLoC

This is a library for writing tests for utilities that read text files and produce text output.

Build Status

It is inspired by and similar to LLVM Filecheck, but it is not directly compatible.


lib.rs:

This crate provides a text pattern matching library with functionality similar to the LLVM project's FileCheck command.

A list of directives is typically extracted from a file containing a test case. The test case is then run through the program under test, and its output matched against the directives.

See the CheckerBuilder and Checker types for the main library API.

Directives

These are the directives recognized by filecheck:

 check: <pattern>
 sameln: <pattern>
 nextln: <pattern>
 unordered: <pattern>
 not: <pattern>
 regex: <variable>=<regex>
 

Each directive is described in more detail below.

Example

The Rust program below prints the primes less than 100. It has filecheck directives embedded in comments:

fn is_prime(x: u32) -> bool {
    (2..x).all(|d| x % d != 0)
}

// Check that we get the primes and nothing else:
//   regex: NUM=\d+
//   not: $NUM
//   check: 2
//   nextln: 3
//   check: 89
//   nextln: 97
//   not: $NUM
fn main() {
    for p in (2..10).filter(|&x| is_prime(x)) {
        println!("{}", p);
    }
}

A test driver compiles and runs the program, then pipes the output through filecheck:

$ rustc primes.rs
$ ./primes | clif-util filecheck -v
#0 regex: NUM=\d+
#1 not: $NUM
#2 check: 2
#3 nextln: 3
#4 check: 89
#5 nextln: 97
#6 not: $NUM
no match #1: \d+
> 2
  ~
match #2: \b2\b
> 3
  ~
match #3: \b3\b
> 5
> 7
...
> 79
> 83
> 89
  ~~
match #4: \b89\b
> 97
  ~~
match #5: \b97\b
no match #6: \d+
OK

The check: directive

Match patterns non-overlapping and in order:

#0 check: one
#1 check: two

These directives will match the string "one two", but not "two one". The second directive must match after the first one, and it can't overlap.

The sameln: directive

Match a pattern in the same line as the previous match.

#0 check: one
#1 sameln: two

These directives will match the string "one two", but not "one\ntwo". The second match must be in the same line as the first. Like the check: directive, the match must also follow the first match, so `"two one" would not be matched.

If there is no previous match, sameln: matches on the first line of the input.

The nextln: directive

Match a pattern in the next line after the previous match.

#0 check: one
#1 nextln: two

These directives will match the string "one\ntwo", but not "one two" or "one\n\ntwo".

If there is no previous match, nextln: matches on the second line of the input as if there were a previous match on the first line.

The unordered: directive

Match patterns in any order, and possibly overlapping each other.

#0 unordered: one
#1 unordered: two

These directives will match the string "one two" and the string "two one".

When a normal ordered match is inserted into a sequence of unordered: directives, it acts as a barrier:

#0 unordered: one
#1 unordered: two
#2 check: three
#3 unordered: four
#4 unordered: five

These directives will match "two one three four five", but not "two three one four five". The unordered: matches are not allowed to cross the ordered check: directive.

When unordered: matches define and use variables, a topological order is enforced. This means that a match referencing a variable must follow the match where the variable was defined:

#0 regex: V=\bv\d+\b
#1 unordered: $(va=$V) = load
#2 unordered: $(vb=$V) = iadd $va
#3 unordered: $(vc=$V) = load
#4 unordered: iadd $va, $vc

In the above directives, #2 must match after #1, and #4 must match after both #1 and #3, but otherwise they can match in any order.

The not: directive

Check that a pattern does not appear between matches.

#0 check: one
#1 not: two
#2 check: three

The directives above will match "one five three", but not "one two three".

The pattern in a not: directive can't define any variables. Since it never matches anything, the variables would not get a value.

The regex: directive

Define a shorthand name for a regular expression.

#0 regex: ID=\b[_a-zA-Z][_0-9a-zA-Z]*\b
#1 check: $ID + $ID

The regex: directive gives a name to a regular expression which can then be used as part of a pattern to match. Patterns are otherwise just plain text strings to match, so this is not simple macro expansion.

See the Rust regex crate for the regular expression syntax.

Patterns and variables

Patterns are plain text strings to be matched in the input file. The dollar sign is used as an escape character to expand variables. The following escape sequences are recognized:

 $$                Match single dollar sign.
 $()               Match the empty string.
 $(=<regex>)       Match regular expression <regex>.
 $<var>            Match contents of variable <var>.
 $(<var>)          Match contents of variable <var>.
 $(<var>=<regex>)  Match <regex>, then
                   define <var> as the matched text.
 $(<var>=$<rxvar>) Match regex in <rxvar>, then
                   define <var> as the matched text.
 

Variables can contain either plain text or regular expressions. Plain text variables are defined with the $(var=...) syntax in a previous directive. They match the same text again. Backreferences within the same pattern are not allowed. When a variable is defined in a pattern, it can't be referenced again in the same pattern.

Regular expression variables are defined with the regex: directive. They match the regular expression each time they are used, so the matches don't need to be identical.

Word boundaries

If a pattern begins or ends with a (plain text) letter or number, it will only match on a word boundary. Use the $() empty string match to prevent this:

check: one$()

This will match "one" and "onetwo", but not "zeroone".

The empty match syntax can also be used to require leading or trailing whitespace:

check: one, $()

This will match "one, two" , but not "one,two". Without the $(), trailing whitespace would be trimmed from the pattern.

Dependencies

~2.5–4MB
~72K SLoC