2 unstable releases

0.2.0 Nov 3, 2023
0.1.0 Jan 26, 2022

#175 in Database interfaces

MPL-2.0 license

545KB
11K SLoC

linux-package-analyzer

linux-package-analyzer is a binary Rust crate providing the lpa command-line executable. This CLI tool facilitates indexing and then inspecting the contents of Linux package repositories. Both Debian and RPM based repositories are supported.

Run lpa help for more details.

Installing

# From the latest released version on crates.io:
$ cargo install linux-package-analyzer

# From the latest commit in the canonical Git repository:
$ cargo install --git https://github.com/indygreg/linux-packaging-rs linux-package-analyzer

# From the root directory of a Git source checkout:
$ cargo install --path linux-package-analyzer

How It Works

lpa exposes sub-commands for importing the contents of a specified package repository into a local SQLite database. Essentially, the package lists from the remote repository are retrieved and referenced packages are downloaded and their content indexed. The indexed content includes:

  • Files installed by the package
  • ELF file content
  • File header values
  • Section metadata
  • Dynamic library dependencies
  • Symbols
  • x86 instruction counts

Additional sub-commands exist for performing analysis of the indexed content within the SQLite databases. However, there is a lot of data in the SQLite database that is not exposed or queryable via the CLI.

Example

The following command will import all packages from Ubuntu 21.10 Impish for amd64 into the SQLite database ubuntu-impish.db:

lpa --db ubuntu-impish.db \
    import-debian-repository \
    --components main,multiverse,restricted,universe \
    --architectures amd64 \
    http://us.archive.ubuntu.com/ubuntu impish

This should download ~96 GB of packages (as of January 2022) and create a ~12 GB SQLite database.

Once we have a populated database, we can run commands to query its content.

To see which files import (and presumably call) a specific C function:

lpa --db ubuntu-impish.db \
    elf-files-importing-symbol OPENSSL_init_ssl

To see what are the most popular ELF section names:

lpa --db ubuntu-impish.db elf-section-name-counts

Power users may want to write their own queries against the database. To get started, open the SQLite database and poke around:

$ sqlite3 ubuntu-impish.db
SQLite version 3.35.5 2021-04-19 18:32:05
Enter ".help" for usage hints.

sqlite> .tables
elf_file                          package_file
elf_file_needed_library           symbol_name
elf_file_x86_base_register_count  v_elf_needed_library
elf_file_x86_instruction_count    v_elf_symbol
elf_file_x86_register_count       v_package_elf_file
elf_section                       v_package_file
elf_symbol                        v_package_instruction_count
package

sqlite> select * from v_elf_needed_library where library_name = "libc.so.6" order by package_name asc limit 1;
0ad|0.0.25b-1|http://us.archive.ubuntu.com/ubuntu/pool/universe/0/0ad/0ad_0.0.25b-1_amd64.deb|usr/games/pyrogenesis|libc.so.6

The v_ prefixed tables are views and conveniently pull in data from multiple tables. For example, v_elf_symbol has all the columns of elf_symbol but also expands the package name, version, file path, etc.

Constants and Special Values

Various ELF data uses constants to define attributes. e.g. elf_file.machine is an integer holding the ELF machine type. A good reference for values of these constants is https://docs.rs/object/0.28.2/src/object/elf.rs.html#1-6256.

lpa also exposes various reference-* commands for printing known values.

Known Issues

x86 Disassembly Quirks

On package index/import, an attempt is made to disassemble x86 / x86-64 files so instruction counts and register usage can be stored in the database.

We disassemble all sections marked as executable. Instructions in other sections may not be found (this is hopefully rare).

We disassemble using the iced_x86 Rust crate. So any limitations in that crate apply to the disassembler.

We disassemble instructions by iterating over content of the binary section, attempting to read instructions until end of section. Executable sections can contain NULL bytes, inline data, and other bytes that may not represent valid instructions. This will result in many byte sequences decoding to the special invalid instruction. In some cases, a byte sequence may decode to an instruction even though the underlying data is not an instruction. i.e. there can be false positives on instruction counts.

Intermittent HTTP Failures on Package Retrieval

Intermittent HTTP GET failures when importing packages is expected due to intrinsic network unreliability. This often manifests as an error like the following:

error processing package (ignoring): repository I/O error on path pool/universe/g/gcc-10/gnat-10_10.3.0-11ubuntu1_amd64.deb: Custom { kind: Other, error: "error sending HTTP request: reqwest::Error { kind: Request, url: Url { scheme: \"http\", cannot_be_a_base: false, username: \"\", password: None, host: Some(Domain(\"us.archive.ubuntu.com\")), port: None, path: \"/ubuntu/pool/universe/g/gcc-10/gnat-10_10.3.0-11ubuntu1_amd64.deb\", query: None, fragment: None }, source: hyper::Error(IncompleteMessage) }" }

If you see failures like this, simply retry the import operation. Already imported packages should automatically be skipped.

Package Server Throttling

lpa can issue parallel HTTP requests to retrieve content. By default, it issues up to as many parallel requests as CPU cores/threads.

Some package repositories limit the number of simultaneous HTTP connections/requests by client. If your machine has many CPU cores, you may run into these limits and get a high volume of HTTP errors when fetching packages. To mitigate, reduce the number of simultaneous I/O operations via --threads. e.g. lpa --threads 4 ...

SQLite Integrity Weakening

To maximize speed of import operations, SQLite databases have their content integrity and durability guarantees weakened via PRAGMA statements issued on database open. A process or machine crash during a write operation could corrupt the SQLite database more easily than it otherwise would.

Dependencies

~91MB
~1.5M SLoC