#malware #forensics #cybersecurity #user-group #security #source-file #malware-research

app malwaredb

MalwareDB does the bookkeeping for malware & goodware datasets, aimed at helping malware researchers and forensic investigators

7 releases

0.0.8 Mar 22, 2024
0.0.7 Feb 29, 2024
0.0.6 Jan 30, 2024
0.0.5 Dec 30, 2023
0.0.1 Aug 22, 2023

#68 in Database interfaces

Download history 2/week @ 2024-01-29 4/week @ 2024-02-19 154/week @ 2024-02-26 7/week @ 2024-03-04 42/week @ 2024-03-11 155/week @ 2024-03-18 44/week @ 2024-04-01

242 downloads per month

Apache-2.0 and maybe GPL-3.0-only…

11K SLoC


TestLintCrossOpenSSF Scorecard

Inspired by VXCage and VirusTotal, MalwareDB is a malware knowledge management system which handles the bookkeeping regarding malware/goodware samples: hashes, origination, similarity, file types, and more. Its intention is to help malware/cybersecurity researchers, forensic investigators, and others who have a need to handle malware, or other files of potentially unknown origin. This is very much a work in progress and alpha-quality project at present.

Key Features:

  • Store malware, goodware, or unknown file samples.
  • Categorize samples by:
    • Labels, a hierarchical taxonomy (not yet implemented)
    • Origin, the source of the sample.
  • Permissions by group, access to file based on users' group membership
  • Fetch samples by hash
  • Search based on file similarity (requires the Postgres plugins mentioned below)
  • Parse the files for features which may be useful for machine learning models
  • Works on any modern operating system
  • Allow encrypting the files on disk so the server does not cause problems with endpoint security or anti-virus software
  • Supports the CaRT format using the default key.


  • Postgres database server
  • Rust to compile
  • libmagic which is the file command. Install libmagic-dev on Linux, or brew install libmagic on macOS with Homebrew.
    • On Windows: cargo install cargo-vcpkg; vcpkg install libmagic; vcpkg integrate install
    • The MAGIC environment variable may be used to specify the paths for the libmagic database.
  • zlibng for faster decompression of .gz files, optional.
  • Similarity hash extensions for Postgres:
  • Alternatively, use docker which provides a container with the Postgres extensions already installed (though they still have to be activated, see the readme).


This project is in active development and not yet stable, nor are all the features implemented.


Install from source. Check out the repository and build (recommended), or build from crates.io:

  • cargo install malwaredb-client
  • cargo install malwaredb --features=admin,admin-gui,sqlite,vt,zlibng (activates all the features, still requires some external dependencies)

Server Features (which are all opt-in):

  • admin: command-line administrative functionality
  • admin-gui: Slint-powered GUI, tested and works on macOS, Linux, Windows, might work elsewhere?
  • sqlite: Allow the use of SQLite as a database backend. Should only be used for testing and evaluation, as it lacks the similarity optimisations we have for Postgres.
  • vt: Allow (but still be enabled) the VirusTotal functionality (cache AV data for contained samples)
  • zlibng: enable the compression crate to use zlib-ng as the backend library for performance improvements. This is for decompressing .gz files, and optionally for the server to store samples compressed with gzip (must be enabled).


  • Planned features:
    • Web interface as a separate application
    • GUI applications
    • Support for Confidential Computing
    • Encrypting samples, if stored, so the anti-virus on host system doesn't trigger alerts, or allow for accidental infection.
    • Train ML models based on features of the malicious & benign files:
      • Domain-specific features (parsed features from specific file types)
      • Type-agnostic features (information about any sequence of bytes, such as n-grams, entropy, length, etc)
      • Use user input for tags/labels
      • Labels from VirusTotal information for labels with tools like ClarAVy (Code, Paper) or AVClass2.
  • Potential features:
    • File storage backends for HDFS, S3, others?
  • Something missing? Get in touch: file an issue or start a discussion!

Getting Started:

  1. Compile from source, ideally with --features=admin,sqlite.
  2. Create your configuration file. Compile with the sqlite feature to use SQLite. This is more for testing and evaluation than using in a real environment. See the example file in the root of the repository for an example.
  • If the storage section is empty (it's optional), then MalwareDB will only store the metadata about the files, and will not store the samples. That means getting the original file will not be available.
  1. Place the config file in /etc/mdb_server/mdb_config.toml on Linux, or /usr/local/etc/mdb_server/mdb_config.toml on FreeBSD for automatic config file detection. Otherwise, run with mdb_server run load /path/to/file, or mdb_server run config to specify arguments on the command line. Run with --help to see details.

Administrative Items

  1. Since you compiled with the admin feature above, you can run mdb_server admin --help to see administrative options. Admin options require -c /path/to/config.toml to prevent making accidental changes. Note: using the admin command interactions with the database directly, so the server does not need to be running.
  2. List users with: mdb_server admin -c /path/to/config.toml list users. There is a default admin user, but no password is set. So let's set one.
  3. Reset Admin's password: mdb_server admin -c /path/to/config.toml reset-password --uname admin. You'll be prompted for the password and it won't echo. The admin user doesn't do anything special at the moment, but that will change.
  4. File are organized by sources, and groups have access to sources. So groups and sources must be added and linked to be able to add files.
  • Create a source, look at the command line options: mdb_server admin -c /path/to/config.toml create source --help
  • Create a group, look at the command line options: mdb_server admin -c /path/to/config.toml create group --help
  • Add the group to the source, look at the command line options: mdb_server admin -c /path/to/config.toml add-group-to-source --help
  • Add the user to the group, look at the command line options: mdb_server admin -c /path/to/config.toml add-user-to-group --help
  1. Now, use the client to login with mdb_client while mdb_server is running: mdb_client login http://localhost:8080 admin, replacing the URL with the actual IP and port you chose in the server configuration file.
  2. Test that the client works with mdb_client whoami, it should show the user information and available groups and sources.

Loading Files

  • Files may be uploaded using the client: mdb_client submit-samples -s SOURCE_ID /path/to/files_or_dirs. Paths may be to files or directories, and more than one path may be specified. All items will be uploaded to the same source (specified by the ID). If the file is a Zip, it will be decompressed in memory and each file submitted individually as long as it's not a known document type (like MS Office .docx, .xlsx, etc.).
  • Files may also be uploaded using the admin command from the server: mdb_server admin -c /path/to/config.toml -s SOURCE_ID -u USER_ID /path/to/files_or_dirs. With the server admin function, a user ID must also be provided. Otherwise, this works the same way as the client, directories and files may be provided, they will be associated with the same source, and Zip files will be decompressed in memory and submitted individually if not a known MS Office format.

Downloading Files

  • Using the client, a sample may be retrieved using it's hash. Hash types are detected by length, and supported hashes are: MD5, SHA1, SHA256, SHA384, and SHA512.
  • mdb_client retrieve-sample SPECIFY_HASH_HERE. One hash per request, and it will be downloaded if it exists, and if the user has access to the group and source to which the sample is linked.

Searching for Similar Files

  • Using the client, similarity hashes are calculated and submitted to the server. The sample is not sent to the server! Just hashes.
  • mdb_client find-similar /path/to/file.bin. The same restriction with downloading applies: the user must have access to the group and source to which a potential similar file is linked. The output will be the hashes of the similar files, and by what means (similarity algorithm) the result is similar.

Misc. Client Commands

  • mdb_client server-info displays some statics about the server, including version numbers, database type, and total amount of files.
  • mdb_client server-types displays a list and magic numbers of supported file types.


Some overall goals and design:

  • MalwareDB shall be easy to use.
  • MalwareDB shall be a place to store your data and use a simple database schema so that other applications may interact with the data directly.
  • MalwareDB shall collect and enrich malicious and benign files so that some features may be used for machine learning models.
  • MalwareDB should provide reusable components which may benefit other projects, even if not directly related.


~832K SLoC