731 stable releases
| new 2.37.112 | Dec 16, 2025 |
|---|---|
| 2.37.109 | Jul 8, 2025 |
| 2.36.7 | Mar 31, 2025 |
| 2.23.3 | Dec 31, 2024 |
| 0.0.3 | Sep 21, 2024 |
#1970 in Command line utilities
1,325 downloads per month
Used in search_for_llms
210KB
5K
SLoC
spider_transformations
A high-performance transformation library for Rust, used by Spider Cloud for AI-powered content cleaning across multiple locales.
This project depends on the spider crate.
Usage
[dependencies]
spider_transformations = "2"
use spider_transformations::transformation::content;
fn main() {
// page comes from the spider object when streaming.
let mut conf = content::TransformConfig::default();
conf.return_format = content::ReturnFormat::Markdown;
let content = content::transform_content(&page, &conf, &None, &None);
}
Transform types
- Markdown
- Commonmark
- Text
- Markdown (Text Map) or HTML2Text
- WIP: HTML2XML
Enhancements
- Readability
- Encoding
Chunking
There are several chunking utils in the transformation mod.
This project has rewrites and forks of html2md, and html2text for performance and bug fixes.
License
MIT
Dependencies
~24–46MB
~740K SLoC