20 releases

0.7.2	Oct 27, 2023
0.7.1	Oct 21, 2021
0.7.0	Aug 26, 2021
0.6.4	Apr 23, 2021
0.1.1	Nov 26, 2019

#30 in Emulators

81 downloads per month
Used in haybale-pitchfork

MIT license

705KB
12K SLoC

`haybale`: Symbolic execution of LLVM IR, written in Rust

haybale is a general-purpose symbolic execution engine written in Rust. It operates on LLVM IR, which allows it to analyze programs written in C/C++, Rust, Swift, or any other language which compiles to LLVM IR. In this way, it may be compared to KLEE, as it has similar goals, except that haybale is written in Rust and makes some different design decisions. That said, haybale makes no claim of being at feature parity with KLEE.

Okay, but what is a symbolic execution engine?

A symbolic execution engine is a way of reasoning - rigorously and mathematically - about the behavior of a function or program. It can reason about all possible inputs to a function without literally brute-forcing every single one. For instance, a symbolic execution engine like haybale can answer questions like:

Are there any inputs to (some function) that cause it to return 0? What are they?
Is it possible for this loop to execute exactly 17 times?
Can this pointer ever be NULL?

Symbolic execution engines answer these questions by converting each variable in the program or function into a mathematical expression which depends on the function or program inputs. Then they use an SMT solver to answer questions about these expressions, such as the questions listed above.

Getting started

1. Install

haybale is on crates.io, so you can simply add it as a dependency in your Cargo.toml, selecting the feature corresponding to the LLVM version you want:

[dependencies]
haybale = { version = "0.7.2", features = ["llvm-14"] }

Currently, the supported LLVM versions are llvm-9, llvm-10, llvm-11, llvm-12, llvm-13, and llvm-14.

haybale depends (indirectly) on the LLVM and Boolector libraries.

LLVM must be available on your system, in the version which matches the selected feature. (For instance, if you select the llvm-14 feature, LLVM 14 must be available on your system.) For more details and instructions on installing LLVM and making sure Cargo can find it, see the llvm-sys README.
For Boolector you have two options:
- You can compile and install Boolector 3.2.1 on your system as a shared library. (Make sure you configure it as a shared library, e.g., using ./configure.sh --shared, and install it, using make install.)
- Or, you can enable the haybale feature vendor-boolector. With this option, Cargo will automatically download and build Boolector and statically link to it. E.g.,
```
[dependencies]
haybale = { version = "0.7.2", features = ["llvm-14", "vendor-boolector"] }
```
  This option probably only works on Linux and macOS, and requires standard build tools to be available on your system -- e.g., for Debian-based distributions, build-essential, cmake, curl, and git.

2. Acquire bitcode to analyze

Since haybale operates on LLVM bitcode, you'll need some bitcode to get started. If the program or function you want to analyze is written in C, you can generate LLVM bitcode (*.bc files) with clang's -c and -emit-llvm flags:

clang -c -emit-llvm source.c -o source.bc

For debugging purposes, you may also want LLVM text-format (*.ll) files, which you can generate with clang's -S and -emit-llvm flags:

clang -S -emit-llvm source.c -o source.ll

If the program or function you want to analyze is written in Rust, you can likewise use rustc's --emit=llvm-bc and --emit=llvm-ir flags.

Note that in order for haybale to print source-location information (e.g., source filename and line number) in error messages and backtraces, the LLVM bitcode will need to include debuginfo. You can ensure debuginfo is included by passing the -g flag to clang, clang++, or rustc when generating bitcode.

3. Create a Project

A haybale Project contains all of the code currently being analyzed, which may be one or more LLVM modules. To get started, simply create a Project from a single bitcode file:

let project = Project::from_bc_path("/path/to/file.bc")?;

For more ways to create Projects, including analyzing entire libraries, see the Project documentation.

4. Use built-in analyses

haybale currently includes two simple built-in analyses: get_possible_return_values_of_func(), which describes all the possible values a function could return for any input, and find_zero_of_func(), which finds a set of inputs to a function such that it returns 0. These analyses are provided both because they may be of some use themselves, but also because they illustrate how to use haybale.

For an introductory example, let's suppose foo is the following C function:

int foo(int a, int b) {
    if (a > b) {
        return (a-1) * (b-1);
    } else {
        return (a + b) % 3 + 10;
    }
}

We can use find_zero_of_func() to find inputs such that foo will return 0:

match find_zero_of_func("foo", &project, Config::default(), None) {
    Ok(None) => println!("foo can never return 0"),
    Ok(Some(inputs)) => println!("Inputs for which foo returns 0: {:?}", inputs),
    Err(e) => panic!("{}", e),  // use the pretty Display impl for errors
}

Writing custom analyses

haybale can do much more than just describe possible function return values and find function zeroes. In this section, we'll walk through how we could find a zero of the function foo above without using the built-in find_zero_of_func(). This will illustrate how to write a custom analysis using haybale.

ExecutionManager

All analyses will use an ExecutionManager to control the progress of the symbolic execution. In the code snippet below, we call symex_function() to create an ExecutionManager which will analyze the function foo - it will start at the top of the function, and end when the function returns. In between, it will also analyze any functions called by foo, as necessary and depending on the Config settings.

let mut em = symex_function("foo", &project, Config::<DefaultBackend>::default(), None);

Here it was necessary to not only specify the default haybale configuration, as we did when calling find_zero_of_func(), but also what "backend" we want to use. The DefaultBackend should be fine for most purposes.

Paths

The ExecutionManager acts like an Iterator over paths through the function foo. Each path is one possible sequence of control-flow decisions (e.g., which direction do we take at each if statement) leading to the function returning some value. The function foo in this example has two paths, one following the "true" branch and one following the "false" branch of the if.

Let's examine the first path through the function:

let result = em.next().expect("Expected at least one path");

In the common case, result contains the function return value on this path, as a Boolector BV (bitvector) wrapped in the ReturnValue enum. Since we know that foo isn't a void-typed function (and won't throw an exception or abort), we can simply unwrap the ReturnValue to get the BV:

let retval = match result {
    Ok(ReturnValue::Return(r)) => r,
    Ok(ReturnValue::ReturnVoid) => panic!("Function shouldn't return void"),
    Ok(ReturnValue::Throw(_)) => panic!("Function shouldn't throw an exception"),
    Ok(ReturnValue::Abort) => panic!("Function shouldn't panic or exit()"),
    ...

result could also be an Err describing an Error which was encountered while processing the path. In this case, we could just ignore the error and keep calling next() to try to find paths which didn't have errors. Or we could get information about the error like this:

    ...
    Err(e) => panic!("{}", em.state().full_error_message_with_context(e)),
};

This gets information about the error from the program State, which we'll discuss next. But for the rest of this tutorial, we'll assume that we got the Ok result, and at this point retval is a BV representing the function return value on the first path.

States

For each path, the ExecutionManager provides not only the final result of the path (either aReturnValue or an Error), but also the final program State at the end of that path. We can get immutable access to the State with state(), or mutable access with mut_state().

let state = em.mut_state();  // the final program state along this path

To test whether retval can be equal to 0 in this State, we can use state.bvs_can_be_equal():

let zero = state.zero(32);  // The 32-bit constant 0
if state.bvs_can_be_equal(&retval, &zero)? {
    println!("retval can be 0!");
}

Getting solutions for variables

If retval can be 0, let's find what values of the function parameters would cause that. First, we'll add a constraint to the State requiring that the return value must be 0:

retval._eq(&zero).assert();

and then we'll ask for solutions for each of the parameters, given this constraint:

// Get a possible solution for the first parameter.
// In this case, from looking at the text-format LLVM IR, we know the variable
// we're looking for is variable #0 in the function "foo".
let a = state.get_a_solution_for_irname(&String::from("foo"), Name::from(0))?
    .expect("Expected there to be a solution")
    .as_u64()
    .expect("Expected solution to fit in 64 bits");

// Likewise the second parameter, which is variable #1 in "foo"
let b = state.get_a_solution_for_irname(&String::from("foo"), Name::from(1))?
    .expect("Expected there to be a solution")
    .as_u64()
    .expect("Expected solution to fit in 64 bits");

println!("Parameter values for which foo returns 0: a = {}, b = {}", a, b);

Alternately, we could also have gotten the parameter BVs from the ExecutionManager like this:

let a_bv = em.param_bvs()[0].clone();
let b_bv = em.param_bvs()[1].clone();

let a = em.state().get_a_solution_for_bv(&a_bv)?
    .expect("Expected there to be a solution")
    .as_u64()
    .expect("Expected solution to fit in 64 bits");

let b = em.state().get_a_solution_for_bv(&b_bv)?
    .expect("Expected there to be a solution")
    .as_u64()
    .expect("Expected solution to fit in 64 bits");

println!("Parameter values for which foo returns 0: a = {}, b = {}", a, b);

Documentation

Full documentation for haybale can be found on docs.rs, or of course you can generate local documentation with cargo doc --open.

Compatibility

Currently, the official crates.io releases of haybale (0.7.0 and later) depend on Boolector 3.2.1 and LLVM 9, 10, 11, 12, 13, or 14, selected via feature flags llvm-9 through llvm-14. As of this writing, choosing an LLVM version has essentially no effect on haybale's features or interface; the only difference is the ability to analyze bitcode generated with newer LLVMs. (And the LLVM 10+ versions can process AtomicRMW instructions; see #12.)

For LLVM 8, you can try the llvm-8 branch of this repo. This branch is unmaintained, and is approximately at feature parity with haybale 0.2.1. It may work for your purposes; or you can update to LLVM 9 or later and the latest haybale.

LLVM 7 and earlier are not supported.

haybale works on stable Rust, and requires Rust 1.45 or later.

Under the hood

haybale is built using the Rust llvm-ir crate and the Boolector SMT solver (via the Rust boolector crate).

Changelog

Version 0.7.2 (Oct 26, 2023)

Support for LLVM 14 via the llvm-14 feature
Fixed/improved support for a couple of intrinsics

Version 0.7.1 (Oct 21, 2021)

Support for LLVM 13 via the llvm-13 feature
haybale now requires Rust 1.45+ (previously 1.43 or 1.44)

Version 0.7.0 (Aug 26, 2021)

Support for LLVM 12 via the llvm-12 feature
New Cargo feature to vendor Boolector: automatically download, build, and statically link Boolector as part of the haybale build process. See the "Install" section of the README above.
symex_function() now takes an additional argument params. You can use this argument to specify constraints for the function parameters, or even specify specific hardcoded values. Or, you can just pass None and get the previous haybale behavior, treating all parameters as completely unconstrained.
find_zero_of_func() and get_possible_return_values_of_func() likewise now take a params argument to specify constraints on function parameters.
State has a new public field proj providing access to the Project.
Function hooks no longer take a Project parameter explicitly. Instead, you can access the Project through the proj field of the State object.
ExecutionManager has a new public method .func() which provides access to the toplevel Function.
State has a new public method get_path_length(), also available as the toplevel function get_path_length().
Updated llvm-ir dependency to 0.8.0, which results in minor breaking changes to parts of haybale's API, where llvm-ir types are exposed.

Version 0.6.4 (Apr 22, 2021)

Fix the build with Rust 1.51+ (#16). (Minimum Rust version for haybale remains unchanged: 1.43+ for LLVM 9 or 10 users, or 1.44+ for LLVM 11 users.)

Version 0.6.3 (Oct 26, 2020)

Fix the documentation build on docs.rs (#13)

Version 0.6.2 (Oct 20, 2020)

Support for LLVM 11 via the llvm-11 feature
get_possible_return_values_of_func() now handles void functions properly (#10)
Support LLVM atomicrmw instructions (only for LLVM 10+) (#12)
Support LLVM freeze instructions (which only exist in LLVM 10+)
Built-in support for a few more Rust standard-library functions related to panic handling
State has a new public method get_bv_by_irname()
LLVM 11 users need Rust 1.44+, due to requirements of llvm-ir. LLVM 9 or 10 users still need only Rust 1.43+.

Version 0.6.1 (Sep 17, 2020)

Both State and Project now have a method size_in_bits() which gets the size of any Type in bits, accounting for the Project's pointer size and struct definitions. This is intended to replace state.size() and state.size_opaque_aware(), both of which are now deprecated and will be removed in haybale 0.7.0. Likewise, state.fp_size() was deprecated and renamed to state.fp_size_in_bits().
- Note: these deprecated methods were actually removed in 0.7.1.

Version 0.6.0 (Sep 1, 2020)

haybale now supports both LLVM 9 and LLVM 10, using the same branch and same crates.io releases. When using haybale, you must choose either the llvm-9 or the llvm-10 feature.
Updated llvm-ir dependency to 0.7.1 (from 0.6.0), which includes runtime and memory-usage performance improvements, particularly for large bitcode files. This also involves a few breaking changes to parts of haybale's API.
haybale now requires Rust 1.43+ (previously 1.40+) due to requirements of llvm-ir 0.7.1.

Version 0.5.1 (Aug 31, 2020)

Fix for issue #9 regarding zero-element arrays (which particularly may appear when analyzing Rust code)
Built-in support for the llvm.ctlz and llvm.cttz intrinsics

Version 0.5.0 (Jul 29, 2020)

Compatibility:

haybale now depends on LLVM 10 by default (up from LLVM 9). LLVM 9 is still supported on a separate branch; see "Compatibility" above.
Updated boolector dependency to crate version 0.4.0, which requires Boolector version 3.2.1 (up from 3.1.0).

Renames which affect the public API:

Rename SimpleMemoryBackend to DefaultBackend and make it default. Rename BtorBackend to CellMemoryBackend, and the memory module to cell_memory.
Remove the layout module. Its functions are now available as methods on State. Also, many of these functions now return u32 instead of usize.

32-bit targets and related changes:

With DefaultBackend, haybale now supports LLVM bitcode which was compiled for 32-bit targets (previously only supported 64-bit targets).
The new_uninitialized() and new_zero_initialized() methods on the backend::Memory trait, simple_memory::Memory, and cell_memory::Memory now take an additional parameter indicating the pointer size.
Project has a new public method pointer_size_bits().

Other:

Built-in support for the llvm.expect intrinsic, and built-in support for the llvm.bswap intrinsic with vector operands (previously only supported scalar operands)
solver_utils::PossibleSolutions has new constructors empty(), exactly_one(), and exactly_two() (useful for testing), and also implements FromIterator, allowing you to .collect() an iterator into it
Bugfix for the {min,max}_possible_solution_for_bv_as_binary_str() functions in the solver_utils module

Version 0.4.0 (Mar 31, 2020)

New features:

Support LLVM cmpxchg instructions
Support for instruction callbacks - see Config.callbacks. This allows you to take arbitrary actions based on the instruction about to be processed.

Config:

Config.null_detection has been renamed to Config.null_pointer_checking, and its type has been changed to allow for additional options.
Config::new() now takes no parameters. It is now the same as Config::default() except that it comes with no function hooks.

Other utility functions/methods:

The hook_utils module now includes two new functions memset_bv and memcpy_bv.
layout::size_opaque_aware now returns an Option rather than panicking.
The to_string_* methods on Location are now public, rather than internal to the crate, allowing users more control over the String representation of a Location.

Error handling:

Error has three new variants UnreachableInstruction, FailedToResolveFunctionPointer, and HookReturnValueMismatch. All of these were previously reported as Error::OtherError, but now have dedicated variants.
Error::LoopBoundExceeded now also includes the value of the loop bound which was exceeded.

Other notes:

haybale no longer selects features of the log crate. This allows downstream users to select these features or not, and in particular, allows users to enable debug logging in release builds.

Version 0.3.2 (Feb 28, 2020)

New option Config.max_callstack_depth allows you to limit the callstack depth for an analysis - automatically ignoring calls of LLVM functions which would exceed that callstack depth. The default for this setting is no limit, matching the previous behavior of haybale.
New option Config.max_memcpy_length allows you to limit the maximum size of memcpy, memset, and memmove operations. The default for this setting is no limit, matching the previous behavior of haybale.
New method FunctionHooks::add_default_hook() allows you to supply a "default hook" which will be used when no other definition or hook is found for a function call. If no default hook is provided, this will result in a FunctionNotFound error, just as it did previously.
Performance improvements for analyzing calls of function pointers.
Improved a few error messages.

Version 0.3.1 (Feb 5, 2020)

Fix some broken links in the README and docs. No functional changes.

Version 0.3.0 (Feb 5, 2020)

Solver timeouts:

New setting Config.solver_query_timeout controls the maximum amount of time haybale will spend on a single solver query before returning Error::SolverError. This setting defaults to 300 seconds (5 minutes). The setting can also be disabled entirely, which results in the same behavior as previous versions of haybale (no time limit on solver queries).