#readability #safari-reader #clean-html

llm_readability

Readability library for LLM's built in Rust

10 releases

0.0.10 Aug 29, 2024
0.0.9 Aug 29, 2024

#1449 in Web programming

Download history 639/week @ 2024-08-25 69/week @ 2024-09-01 14/week @ 2024-09-08

722 downloads per month

MIT license

27KB
744 lines

llm_readability

The Rust readability library built for performance, AI, and multiple locales. The library is used on Spider Cloud for data cleaning.

Usage

[dependencies]
llm_readability = "0"
use llm_readability::extractor;

fn main() {
  match extractor::extract(&mut "<html>...</html>".as_bytes(), "https://example.com", None) {
      Ok(product) => {
          println!("------- html ------");
          println!("{}", product.content);
          println!("---- plain text ---");
          println!("{}", product.text);
      },
      Err(_) => println!("error occured"),
  }
}

This project is a rewrite of readability-rs for performance and bug fixes.

Dependencies

~8–15MB
~282K SLoC