1 unstable release

0.1.5 Dec 20, 2023
0.1.4 Dec 12, 2023

#209 in Debugging

Download history 4/week @ 2024-09-21 2/week @ 2024-09-28

60 downloads per month

Apache-2.0

86KB
2K SLoC

bitgrep

It's grep for data types. Ever found yourself looking for a specific numerical value/range in a heap of binary files?

Now you can!

Useful for DFIR, security research and general debugging work, especially when you know what you're looking for but don't know where.

Install

Use cargo install to install the binary from crates.io:

$ cargo install bitgrep
$ bitgrep --data-type u32 --file data.raw -m 55 -M 144

Alternatively you can build a binary using the code from github:

$ git clone https://github.com/jmpfar/bitgrep.git
$ cd bitgrep
$ cargo build --release
$ target/release/bitgrep --data-type f64 --file data.raw -m 29.15 -M 36.0

Usage

To find all all the doubles (f64) with values 29.15 <= x <= 36.0:

$ bitgrep --data-type f64 --file data.raw -m 29.15 -M 36.0

./data.raw: [0x16B6] f64: 34.415624980210914 [9b483333354140]
./data.raw: [0xFDBB] f64: 30.215716721498428 [3d983639373e40]

The output format is:

file_path: [offset] data_type: value [value_in_hex]

Options

In order to find a single literal value you can use the --literal or -l flag. Float comparison is approximate with a ULPS of 4 (will be configurable in the future):

$ bitgrep --data-type f64 --file data4.raw --literal 29.15385732 \
    --endian big

You can also filter by entropy to remove values that have a high chance of being noise.

Entropy ranges between 0 and 8 where 8 represents random data. Entropy greater than 7.5 is usually encrypted, compressed or random. English text has a value of between 3.5 and 5.

$ bitgrep --data-type i128 --file data.raw --literal 123 \
    --max-entropy 7.5

You can use a pipe with the special - file path:

$ cat data.raw | bitgrep --data-type u8 --file - --literal 3

To reduce noise in binary files that contain zero bytes, you can use --exclude-zero. This excludes all absolute zero values (0x0)

$ bitgrep --data-type i32 --file data.raw --min -30 --max 30 \
    --exclude-zero

The above command does not filter values that are approximately close to zero (e.g. 0.00000000000000001). This might be useful when reducing noise in floating point searches. Alternatively use:

$ bitgrep --data-type f64 --file data.raw --min -30.0 --max 30.0 \
    --exclude-literal 0.0

Currently there is no native support for directory globbing or recursion, if you need to search multiple files you can use the find command:

$ find . -type f -exec bitgrep \
		--data-type i32 --file {} --max -78 --min -83 \
		--endian little \;

Supported Types

Currently bitgrep supports all rust numeric data types (use with --data-type):

Rust C
i16 short
i32 int
i64 long long
i128 __int128 (GCC)
u16 unsigned short
u32 unsigned int
u64 unsigned long long
u128 unsigned __int128
f32 float
f64 double

TODO

[!WARNING]
Everything below this point does not exist yet!

Feel free to send pull requests, hopefully I'll get to these before 2026

  1. Filter files by entropy
  2. Add pipe support and other unix semantics
  3. Use stderr
  4. Color output
  5. Hex dump output
  6. Literals search
  7. Hex search (e.g. 0AAD[33-4A]DF)
  8. Exclude zeros
  9. Exclude approximate literal values
  10. Sane error messages
  11. Exclude extreme exponent values
  12. Binary releases
  13. Recursive file search / glob
  14. Date types
    1. 32-bit/64-bit Unix epoch (milliseconds, microseconds, seconds)
    2. Windows
      1. FILETIME
      2. SYSTEMTIME
      3. OLE automation
      4. CLR Time
    3. Apple timestamps
  15. String Search
    1. UTF-8
    2. UTF-16
    3. ASCII code pages
    4. Search string representations of number range: e.g. "10.2" .. "10.722"
    5. Regex
  16. Performance improvements
    1. Convert to static dispatch
    2. Search without converting bytes to number
    3. Lock and buffer stdout
  17. Rule engine, see below
  18. Misc
    1. GUIDs
    2. IP addresses
    3. Custom structs
  19. Debt
    1. Refactor printing to different object/trait
    2. Add integration tests
    3. Create configuration => scanner builder
    4. Filters to enums
    5. Add golden tests

Rule engine

TODO: An imagined JSON of a rules file that can be used as a search configuration.

{
  "filters": {
    "file": {
      "magic": "0xABDEF",
      "types": [
        {
          "double": { "min": 80.3432, "max": 82.221112, "exclude-zero": true }
        },
        { "double": { "min": -32.865, "max": 31.53221, "exclude-zero": true } },
        { "string": { "literal": "AMAzING" } },
        { "string": { "regex": "12334+" } },
        { "bytes": { "literal": "0xDEADBEEF" } },
        { "integer": { "min": -10, "max": 12, "as_string": true } }
      ],
      "entropy": {
        "max": 6
      }
    }
  }
}

The idea is to have predefined rules for specific scenarios and some level of boolean operators for better filtering.

For example, get me all IPs in binary or string form in the ranges 192.168.1.0 - 192.168.3.255 or 10.0.0.1 - 10.0.30.255

Dependencies

~1.6–2.2MB
~42K SLoC