2 releases

new 0.5.4	May 20, 2025
0.5.3	May 20, 2025

#326 in #html

MIT license

6.5MB
1K SLoC

Contains (ELF exe/lib, 18MB) task

Pickaxe

Pickaxe is a Python package for structured data extraction from HTML documents. It provides a simple and intuitive API for parsing HTML documents, and automatically extracting structured data from them.

Features

Written in Rust: Pickaxe is written in Rust, which makes it fast and memory-efficient.
Robust: Pickaxe uses the html5ever and selectors crate for browser-grade HTML parsing and CSS selector matching.
CSS Selectors & XPath: Pickaxe supports both CSS selectors and (simple) XPath expressions for querying HTML documents.

Quick Start

Python

Installation

pip install python-pickaxe

Basic Usage

from pickaxe import HtmlDocument

# Parse an HTML document
document = HtmlDocument.from_str("<html><body><h1>Hello, World!</h1></body></html>")

# Access elements using CSS selectors or XPath expressions
heading = document.find("h1")
print(heading.inner_text)  # Output: Hello, World!

heading = document.find_xpath("//h1")
print(heading.inner_text)  # Output: Hello, World!

Rust

Installation

cargo add rust-pickaxe

Basic Usage

use pickaxe::HtmlDocument;

fn main() {
    // Parse an HTML document
    let document = HtmlDocument::from_str("<html><body><h1>Hello, World!</h1></body></html>").unwrap();

    // Access elements using CSS selectors or XPath expressions
    let heading = document.find("h1").unwrap();
    println!("{}", heading.inner_text());  // Output: Hello, World!

    let heading = document.find_xpath("//h1").unwrap();
    println!("{}", heading.inner_text());  // Output: Hello, World!
}

License

This project is licensed under MIT License.

Support & Feedback

If you encounter any issues or have feedback, please open an issue. We'd love to hear from you!

Made with ❤️ by Emergent Methods

Dependencies

~5–13MB
~150K SLoC