#wikidata #knowledge-graph #wikipedia #dbpedia

bin+lib kgdata_core

Library to process dumps of knowledge graphs (Wikipedia, DBpedia, Wikidata)

2 stable releases

4.0.2 Apr 25, 2025
4.0.1 Apr 10, 2025

#9 in #wikidata

Download history 121/week @ 2025-04-09 11/week @ 2025-04-16 191/week @ 2025-04-23

323 downloads per month

Custom license

195KB
5K SLoC

kgdata PyPI Documentation

KGData is a library to process dumps of Wikipedia, Wikidata. What it can do:

  • Clean up the dumps to ensure the data is consistent (resolve redirect, remove dangling references)
  • Create embedded key-value databases to access entities from the dumps.
  • Extract Wikidata ontology.
  • Extract Wikipedia tables and convert the hyperlinks to Wikidata entities.
  • Create Pyserini indices to search Wikidata’s entities.
  • and more

For a full documentation, please see the website.

Installation

From PyPI (using pre-built binaries):

pip install kgdata[spark]   # omit spark to manually specify its version if your cluster has different version

Dependencies

~47–76MB
~1.5M SLoC