1 unstable release
0.1.5 | Dec 20, 2023 |
---|---|
0.1.4 |
|
#209 in Debugging
60 downloads per month
86KB
2K
SLoC
bitgrep
It's grep for data types. Ever found yourself looking for a specific numerical value/range in a heap of binary files?
Now you can!
Useful for DFIR, security research and general debugging work, especially when you know what you're looking for but don't know where.
Install
Use cargo install
to install the binary from crates.io:
$ cargo install bitgrep
$ bitgrep --data-type u32 --file data.raw -m 55 -M 144
Alternatively you can build a binary using the code from github:
$ git clone https://github.com/jmpfar/bitgrep.git
$ cd bitgrep
$ cargo build --release
$ target/release/bitgrep --data-type f64 --file data.raw -m 29.15 -M 36.0
Usage
To find all all the doubles (f64
) with values 29.15 <= x <= 36.0
:
$ bitgrep --data-type f64 --file data.raw -m 29.15 -M 36.0
./data.raw: [0x16B6] f64: 34.415624980210914 [9b483333354140]
./data.raw: [0xFDBB] f64: 30.215716721498428 [3d983639373e40]
The output format is:
file_path: [offset] data_type: value [value_in_hex]
Options
In order to find a single literal value you can use the --literal
or -l
flag.
Float comparison is approximate with a ULPS of 4 (will be configurable in the future):
$ bitgrep --data-type f64 --file data4.raw --literal 29.15385732 \
--endian big
You can also filter by entropy to remove values that have a high chance of being noise.
Entropy ranges between 0 and 8 where 8 represents random data. Entropy greater than 7.5 is usually encrypted, compressed or random. English text has a value of between 3.5 and 5.
$ bitgrep --data-type i128 --file data.raw --literal 123 \
--max-entropy 7.5
You can use a pipe with the special -
file path:
$ cat data.raw | bitgrep --data-type u8 --file - --literal 3
To reduce noise in binary files that contain zero bytes, you can use --exclude-zero
. This excludes all absolute zero values (0x0
)
$ bitgrep --data-type i32 --file data.raw --min -30 --max 30 \
--exclude-zero
The above command does not filter values that are approximately close to zero (e.g. 0.00000000000000001
). This might be useful when reducing noise in floating point searches. Alternatively use:
$ bitgrep --data-type f64 --file data.raw --min -30.0 --max 30.0 \
--exclude-literal 0.0
Currently there is no native support for directory globbing or recursion, if you need to search multiple files you can use the find
command:
$ find . -type f -exec bitgrep \
--data-type i32 --file {} --max -78 --min -83 \
--endian little \;
Supported Types
Currently bitgrep supports all rust numeric data types (use with --data-type
):
Rust | C |
---|---|
i16 | short |
i32 | int |
i64 | long long |
i128 | __int128 (GCC) |
u16 | unsigned short |
u32 | unsigned int |
u64 | unsigned long long |
u128 | unsigned __int128 |
f32 | float |
f64 | double |
TODO
[!WARNING]
Everything below this point does not exist yet!
Feel free to send pull requests, hopefully I'll get to these before 2026
- Filter files by entropy
- Add pipe support and other unix semantics
- Use stderr
- Color output
- Hex dump output
- Literals search
- Hex search (e.g.
0AAD[33-4A]DF
) - Exclude zeros
- Exclude approximate literal values
- Sane error messages
- Exclude extreme exponent values
- Binary releases
- Recursive file search / glob
- Date types
- 32-bit/64-bit Unix epoch (milliseconds, microseconds, seconds)
- Windows
- FILETIME
- SYSTEMTIME
- OLE automation
- CLR Time
- Apple timestamps
- String Search
- UTF-8
- UTF-16
- ASCII code pages
- Search string representations of number range: e.g. "10.2" .. "10.722"
- Regex
- Performance improvements
- Convert to static dispatch
- Search without converting bytes to number
- Lock and buffer stdout
- Rule engine, see below
- Misc
- GUIDs
- IP addresses
- Custom structs
- Debt
- Refactor printing to different object/trait
- Add integration tests
- Create configuration => scanner builder
- Filters to enums
- Add golden tests
Rule engine
TODO: An imagined JSON of a rules file that can be used as a search configuration.
{
"filters": {
"file": {
"magic": "0xABDEF",
"types": [
{
"double": { "min": 80.3432, "max": 82.221112, "exclude-zero": true }
},
{ "double": { "min": -32.865, "max": 31.53221, "exclude-zero": true } },
{ "string": { "literal": "AMAzING" } },
{ "string": { "regex": "12334+" } },
{ "bytes": { "literal": "0xDEADBEEF" } },
{ "integer": { "min": -10, "max": 12, "as_string": true } }
],
"entropy": {
"max": 6
}
}
}
}
The idea is to have predefined rules for specific scenarios and some level of boolean operators for better filtering.
For example, get me all IPs in binary or string form in the ranges 192.168.1.0 - 192.168.3.255
or 10.0.0.1 - 10.0.30.255
Dependencies
~1.6–2.2MB
~42K SLoC