1 unstable release
| 0.1.0 | Aug 13, 2025 |
|---|
#2222 in Web programming
33KB
201 lines
search_for_llms
A Rust library and command-line tool for searching web pages and fetching their content, suitable for use with LLMs.
Overview
simple_search is a tool that performs web searches and retrieves content from the resulting pages. It combines search engine querying with web scraping capabilities to gather structured, cleaned information from search results, making it particularly suitable for LLM applications.
Features
- Search using Google search engine
- Concurrently fetch multiple web pages
- Extract main content from web pages
- Clean and structure content for LLM consumption
- Return structured data or formatted text
- Save content to local files
- Configurable number of pages and content length
Installation
Add this to your Cargo.toml:
[dependencies]
search_for_llms = "0.1.0"
Or install the command-line tool:
cargo install search_for_llms
Usage
As a Library
use search_for_llms::{search_and_fetch_structured, search_and_fetch_summary};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Get structured results
let results = search_and_fetch_structured("Rust programming", 5, 5000).await?;
// Or get a formatted summary string
let summary = search_and_fetch_summary("Rust programming", 5, 5000).await?;
println!("{}", summary);
Ok(())
}
Command Line
# Basic search
search_for_llms "Rust programming"
# Search with options
search_for_llms "Rust programming" --pages 10 --max-chars 10000
Command Line Options
query- Search query (required)--pages/-p- Number of pages to fetch (default: 5)--max-chars/-m- Maximum characters per page (default: 5000)
Output
The tool creates a fetched_pages directory with:
- Individual HTML and Markdown files for each page
- A summary file with all results
Library Functions
search_and_fetch_structured
Returns structured data in a SearchResults struct:
pub async fn search_and_fetch_structured(
query: &str,
page_count: usize,
max_chars_per_page: usize,
) -> Result<SearchResults, Box<dyn std::error::Error>>
search_and_fetch_summary
Returns a formatted string summary:
pub async fn search_and_fetch_summary(
query: &str,
page_count: usize,
max_chars_per_page: usize,
) -> Result<String, Box<dyn std::error::Error>>
For Developers
Building
cargo build
Running
cargo run -- "your search query"
Testing
cargo test
Publishing to crates.io
To publish this crate to crates.io:
- Create an account at crates.io
- Get your API token from your account settings
- Log in locally with
cargo login <your-token> - Publish with
cargo publish
Before publishing, ensure:
- Your crate name is unique on crates.io
- Your version number is appropriate
- All metadata in Cargo.toml is correct
- You've tested the crate thoroughly
Dependencies
clap- Command line argument parsingserde- Serialization frameworksimple_google- Google search functionalityspider- Web scraping frameworkfutures- Asynchronous programming utilities
License
MIT
Dependencies
~25–44MB
~704K SLoC