3 releases (breaking)
0.3.0 | Nov 2, 2024 |
---|---|
0.2.0 | Nov 3, 2023 |
0.1.0 | Jan 26, 2022 |
#140 in Database interfaces
545KB
11K
SLoC
linux-package-analyzer
linux-package-analyzer
is a binary Rust crate providing the lpa
command-line
executable. This CLI tool facilitates indexing and then inspecting the contents of
Linux package repositories. Both Debian and RPM based repositories are supported.
Run lpa help
for more details.
Installing
# From the latest released version on crates.io:
$ cargo install linux-package-analyzer
# From the latest commit in the canonical Git repository:
$ cargo install --git https://github.com/indygreg/linux-packaging-rs linux-package-analyzer
# From the root directory of a Git source checkout:
$ cargo install --path linux-package-analyzer
How It Works
lpa
exposes sub-commands for importing the contents of a specified package
repository into a local SQLite database. Essentially, the package lists from
the remote repository are retrieved and referenced packages are downloaded
and their content indexed. The indexed content includes:
- Files installed by the package
- ELF file content
- File header values
- Section metadata
- Dynamic library dependencies
- Symbols
- x86 instruction counts
Additional sub-commands exist for performing analysis of the indexed content within the SQLite databases. However, there is a lot of data in the SQLite database that is not exposed or queryable via the CLI.
Example
The following command will import all packages from Ubuntu 21.10 Impish for
amd64 into the SQLite database ubuntu-impish.db
:
lpa --db ubuntu-impish.db \
import-debian-repository \
--components main,multiverse,restricted,universe \
--architectures amd64 \
http://us.archive.ubuntu.com/ubuntu impish
This should download ~96 GB of packages (as of January 2022) and create a ~12 GB SQLite database.
Once we have a populated database, we can run commands to query its content.
To see which files import (and presumably call) a specific C function:
lpa --db ubuntu-impish.db \
elf-files-importing-symbol OPENSSL_init_ssl
To see what are the most popular ELF section names:
lpa --db ubuntu-impish.db elf-section-name-counts
Power users may want to write their own queries against the database. To get started, open the SQLite database and poke around:
$ sqlite3 ubuntu-impish.db
SQLite version 3.35.5 2021-04-19 18:32:05
Enter ".help" for usage hints.
sqlite> .tables
elf_file package_file
elf_file_needed_library symbol_name
elf_file_x86_base_register_count v_elf_needed_library
elf_file_x86_instruction_count v_elf_symbol
elf_file_x86_register_count v_package_elf_file
elf_section v_package_file
elf_symbol v_package_instruction_count
package
sqlite> select * from v_elf_needed_library where library_name = "libc.so.6" order by package_name asc limit 1;
0ad|0.0.25b-1|http://us.archive.ubuntu.com/ubuntu/pool/universe/0/0ad/0ad_0.0.25b-1_amd64.deb|usr/games/pyrogenesis|libc.so.6
The v_
prefixed tables are views and conveniently pull in data from
multiple tables. For example, v_elf_symbol
has all the columns of
elf_symbol
but also expands the package name, version, file path, etc.
Constants and Special Values
Various ELF data uses constants to define attributes. e.g. elf_file.machine
is an integer holding the ELF machine type. A good reference for values of
these constants is
https://docs.rs/object/0.28.2/src/object/elf.rs.html#1-6256.
lpa
also exposes various reference-*
commands for printing known
values.
Known Issues
x86 Disassembly Quirks
On package index/import, an attempt is made to disassemble x86 / x86-64 files so instruction counts and register usage can be stored in the database.
We disassemble all sections marked as executable. Instructions in other sections may not be found (this is hopefully rare).
We disassemble using the iced_x86 Rust crate. So any limitations in that crate apply to the disassembler.
We disassemble instructions by iterating over content of the binary section, attempting to read instructions until end of section. Executable sections can contain NULL bytes, inline data, and other bytes that may not represent valid instructions. This will result in many byte sequences decoding to the special invalid instruction. In some cases, a byte sequence may decode to an instruction even though the underlying data is not an instruction. i.e. there can be false positives on instruction counts.
Intermittent HTTP Failures on Package Retrieval
Intermittent HTTP GET failures when importing packages is expected due to intrinsic network unreliability. This often manifests as an error like the following:
error processing package (ignoring): repository I/O error on path pool/universe/g/gcc-10/gnat-10_10.3.0-11ubuntu1_amd64.deb: Custom { kind: Other, error: "error sending HTTP request: reqwest::Error { kind: Request, url: Url { scheme: \"http\", cannot_be_a_base: false, username: \"\", password: None, host: Some(Domain(\"us.archive.ubuntu.com\")), port: None, path: \"/ubuntu/pool/universe/g/gcc-10/gnat-10_10.3.0-11ubuntu1_amd64.deb\", query: None, fragment: None }, source: hyper::Error(IncompleteMessage) }" }
If you see failures like this, simply retry the import operation. Already imported packages should automatically be skipped.
Package Server Throttling
lpa
can issue parallel HTTP requests to retrieve content. By default, it
issues up to as many parallel requests as CPU cores/threads.
Some package repositories limit the number of simultaneous HTTP
connections/requests by client. If your machine has many CPU cores, you may run
into these limits and get a high volume of HTTP errors when fetching packages.
To mitigate, reduce the number of simultaneous I/O operations via --threads
.
e.g. lpa --threads 4 ...
SQLite Integrity Weakening
To maximize speed of import operations, SQLite databases have their content
integrity and durability guarantees weakened via PRAGMA
statements issued
on database open. A process or machine crash during a write operation could
corrupt the SQLite database more easily than it otherwise would.
Dependencies
~95MB
~1.5M SLoC