2 releases
new 0.5.4 | May 20, 2025 |
---|---|
0.5.3 | May 20, 2025 |
#326 in #html
6.5MB
1K
SLoC
Contains (ELF exe/lib, 18MB) task
Pickaxe
Pickaxe is a Python package for structured data extraction from HTML documents. It provides a simple and intuitive API for parsing HTML documents, and automatically extracting structured data from them.
Features
- Written in Rust: Pickaxe is written in Rust, which makes it fast and memory-efficient.
- Robust: Pickaxe uses the
html5ever
andselectors
crate for browser-grade HTML parsing and CSS selector matching. - CSS Selectors & XPath: Pickaxe supports both CSS selectors and (simple) XPath expressions for querying HTML documents.
Quick Start
Python
Installation
pip install python-pickaxe
Basic Usage
from pickaxe import HtmlDocument
# Parse an HTML document
document = HtmlDocument.from_str("<html><body><h1>Hello, World!</h1></body></html>")
# Access elements using CSS selectors or XPath expressions
heading = document.find("h1")
print(heading.inner_text) # Output: Hello, World!
heading = document.find_xpath("//h1")
print(heading.inner_text) # Output: Hello, World!
Rust
Installation
cargo add rust-pickaxe
Basic Usage
use pickaxe::HtmlDocument;
fn main() {
// Parse an HTML document
let document = HtmlDocument::from_str("<html><body><h1>Hello, World!</h1></body></html>").unwrap();
// Access elements using CSS selectors or XPath expressions
let heading = document.find("h1").unwrap();
println!("{}", heading.inner_text()); // Output: Hello, World!
let heading = document.find_xpath("//h1").unwrap();
println!("{}", heading.inner_text()); // Output: Hello, World!
}
License
This project is licensed under MIT License.
Support & Feedback
If you encounter any issues or have feedback, please open an issue. We'd love to hear from you!
Made with ❤️ by Emergent Methods
Dependencies
~5–13MB
~150K SLoC