#parser #fuzzing #writer #arbitrary #testing #fuzz #serialization

spindle-lib

Simple and efficient expression and byte sequence generator for fuzz testing

1 unstable release

new 0.1.0 Apr 28, 2025

#1188 in Encoding

Apache-2.0

77KB
1.5K SLoC

Spindle is a simple and efficient expression and byte sequence generator to aid fuzz testing parsers and de-serializers.

Usage

Spindle offers a syntax similar to Extended Backus–Naur form which compiles to a state machine --Grammar--that produces structured matching arbitrary sentences from an unstructured feed of bytes.

Spindle integrates with libfuzzer and cargo-fuzz: Unstructured bytes, from the arbitrary crate, are manipulated by the fuzzer based on code coverage.

Spindle can be used to generate database expressions, big decimal strings, JSON, and other syntaxes, as well as slightly malformed variants of correct expressions to test interesting edge cases of parser or de-serializer.

Example

use spindle_lib::Grammar;
use arbitrary::Unstructured;

let math: Grammar = r#"
    expr   : u16 | paren | expr symbol expr ;
    paren  : "(" expr symbol expr ")" ;
    symbol : r"-|\+|\*|÷" ;
"#.parse().unwrap();

let mut u = Unstructured::new(b"poiuyt5r4321sdnlknasdbvcxeygrey");
let sentence: String = math.expression(&mut u, None).unwrap(); // (30057+(12594+((((25976+(0*0))*0*0)*0)*0)*0*0))

The state machine traversal always starts at the first rule. In the example,

  • expr is the first rule and evaluates to either u16, paren, or the concatenation of expr and symbol and expr.
  • ; delimits different rules.
  • u16 is a pre-defined data types that directly evaluates to u16::arbitrary(u)
  • paren evaluates to the concatenation of the literal "(", expr, symbol, expr and, ")".
  • symbol evaluates to the an arbitrary string matching the regex -|\+|\*|÷.

Usage in Fuzzer

use spindle_lib::Grammar;
use libfuzzer_sys::fuzz_target;
use arbitrary::{Arbitrary, Result, Unstructured};
use std::sync::LazyLock;

static GRAMMAR: LazyLock<Grammar> = LazyLock::new(|| {
    r#"
        expr   : u16 | paren | expr symbol expr ;
        paren  : "(" expr symbol expr ")" ;
        symbol : r"-|\+|\*|÷" ;
    "#.parse().unwrap()
});

struct MathExpression(String);

impl<'a> Arbitrary<'a> for MathExpression {
    fn arbitrary(u: &mut Unstructured<'a>) -> Result<Self> {
        Ok(Self(GRAMMAR.expression(u, None)?.0))
    }
}

fuzz_target!(|expr: MathExpression| {
    // ... my_parser(expr);
});

For more examples, see the examples folder.

Pre-defined Rules

  • String evaluates to str::arbitrary(u)
  • u16 evaluates to u16::arbitrary(u)

Visitor

A Visitor is some state that is initialized before traversal and mutated as different rules are visited during the traversal, e.g. visit_or. Vistors that are already implemented are String and Vec<u8> for output buffers, and u64 for classification.

Users can use their own implementation of Visitor, for example if they want to

  • use a different output buffer
  • use a different classification
  • gather data
  • build an abstract syntax tree

Example

use spindle_lib::{Grammar, Visitor};

let math: Grammar = r#"
    expr   : u16 | paren | expr symbol expr ;
    paren  : "(" expr symbol expr ")" ;
    symbol : r"-|\+|\*|÷" ;
"#.parse().unwrap();

/// Detects if any prime numbers were generated.
#[derive(Debug, Default)]
struct PrimeDetector(bool);

impl Visitor for PrimeDetector {
    fn new() -> Self {
        Self::default()
    }

    fn visit_u16(&mut self, num: u16) {
        let num_is_prime = (2..num).all(|a| num % a != 0);
        self.0 |= num_is_prime;
    }
}

let mut u = arbitrary::Unstructured::new(b"qwerty");
let (expr, any_primes): (String, PrimeDetector) = math.expression(&mut u, None).unwrap();
let expr: String = math.expression(&mut u, None).unwrap();
assert!(any_primes.0);

Example

In this example, a math specific abstract syntax tree (AST) is built during the arbitrary traversal.

use spindle_lib::{Grammar, Visitor};

let math: Grammar = r#"
    expr   : u16 | paren | expr symbol expr ;
    paren  : "(" expr symbol expr ")" ;
    symbol : r"-|\+|\*|÷" ;
"#.parse().unwrap();

#[derive(Debug, Default)]
struct MathAst {
    cur_op: Option<char>,
    stack: Vec<MathAstNode>,
}

#[derive(Debug)]
enum MathAstNode {
    Num(u16),
    Expr(Box<MathAstNode>, char, Box<MathAstNode>)
}

impl Visitor for MathAst {
    fn new() -> Self {
        Self::default()
    }

    fn visit_regex(&mut self, regex_output: &[u8]) {
        assert_eq!(regex_output.len(), 1);
        let c = regex_output[0] as char;
        self.cur_op = Some(c);
    }

    fn visit_u16(&mut self, num: u16) {
        match self.cur_op {
            None => self.stack.push(MathAstNode::Num(num)),
            Some(symbol) => {
                let left = Box::new(self.stack.pop().unwrap());
                let right = Box::new(MathAstNode::Num(num));
                let new = MathAstNode::Expr(left, symbol, right);
                self.stack.push(new);
                self.cur_op = None;
            }
        }
    }
}

let mut u = arbitrary::Unstructured::new(b"494392");
// MathAst { cur_op: None, stack: [Expr(Num(13108), '*', Num(0))] }
let tree: MathAst = math.expression(&mut u, None).unwrap();

Dependencies

~1.2–2.1MB
~54K SLoC