8 releases

new 0.0.9 Apr 29, 2024
0.0.8 Mar 22, 2024
0.0.7 Feb 29, 2024
0.0.6 Jan 30, 2024
0.0.1 Aug 22, 2023

#1155 in Parser implementations

40 downloads per month
Used in 3 crates

Apache-2.0

150KB
3.5K SLoC

MalwareDB Types

TestLintCrossCrates.io Version

Note: These parsers are designed to extract potentially useful features from various file types. They are in no way designed to be complete representations of their respective file format. That said, contributions are welcome to extract additional features/information, to add support for a new file format, or to make general improvements!

This crate contains the logic for parsing some executable and document datatypes, and for determining if a Zip file is an MS Office document or an archive of files.

Executable Types:

  • ELF (feature flag elf, default)
  • Mach-O and Fat Mach-O (feature flag macho, default). Fat Mach-O's embedded Mach-O binaries are extracted and processed as child elements.
  • PE32 (feature flag pe32, default)
  • PEF (feature flag pef, not default and probably not useful)

For each executable, the goal is to extract:

  • Section information: names, sizes, entropy
  • Import data
  • Target: architecture, operating system, endianness, pointer size (32 vs 64-bit)
  • Binary type (object file, executable, library, etc.)

Some complications:

  • How to get the imports for ELFs? Go has this figured out, but I haven't been able to replicate. Goblin issue #363.
  • Should I ditch the custom parsers for Goblin? It would allow me to get Authenticode data from PE32 files, but I worry it won't be tolerant to malformed files (as malware tends to be).

Document Types:

  • PDF via pdf (feature flag pdf, default)
  • RTF currently incomplete (feature flag rtf, default)

There should be a simple way to represent the needed data so the component which stores the data in the database doesn't have to be aware of file formats.

Dependencies

~10–17MB
~293K SLoC