Data processing

lib.rs goes well beyond displaying crates.io data as-is. Many crates have incomplete metadata, e.g. lack categories or keywords that would help find the crate. Sometimes the metadata specified by crate authors is incorrect (e.g. the purpose of the parsing category is often misunderstood, or repository links of forked crates still point to the upstream repo instead of the fork, etc.). Download numbers counted by crates.io don't have any throttling or anti-spam measures, so they're biased by automated downloads from web crawlers and uncached CI builds.

To make search work better, and crate pages show more useful information, lib.rs combines data from crates.io with data from github.com, docs.rs, rustsec.org, rustaceans.org, cargo-crev repositories, cargo-vet registry, and its own datasets and analysis. This means that the combined data is not just from crate authors, and should be understood as lib.rs's interpretation, and not necessarily what the crate authors intended.

lib.rs often uses heuristics to complete and fix data. Most of the data quality issues are reported in the maintainer dashboard.

The list of sources and algorithms is likely to be expanded in the future. See also logic for ranking and outdated dependencies.