#rss #html #web #info #http #open-graph #feed


Small library to fetch info about a web page: title, description, language, HTTP info, links, RSS feeds, Opengraph, Schema.org, and more

15 releases (8 stable)

2.0.0 Oct 24, 2023
1.6.0 May 24, 2023
1.5.0 Jan 6, 2023
1.4.0 Dec 12, 2021
0.1.3 Jun 25, 2018

#500 in Web programming

Download history 867/week @ 2023-12-11 715/week @ 2023-12-18 390/week @ 2023-12-25 602/week @ 2024-01-01 1235/week @ 2024-01-08 908/week @ 2024-01-15 1016/week @ 2024-01-22 565/week @ 2024-01-29 731/week @ 2024-02-05 805/week @ 2024-02-12 807/week @ 2024-02-19 748/week @ 2024-02-26 756/week @ 2024-03-04 852/week @ 2024-03-11 962/week @ 2024-03-18 1181/week @ 2024-03-25

3,828 downloads per month
Used in 16 crates (9 directly)

MIT license

553 lines


crates.io docs.rs

Small library to fetch info about a web page: title, description, language, HTTP info, links, RSS feeds, Opengraph, Schema.org, and more


use webpage::{Webpage, WebpageOptions};

let info = Webpage::from_url("http://www.rust-lang.org/en-US/", WebpageOptions::default())
    .expect("Could not read from URL");

// the HTTP transfer info
let http = info.http;

assert_eq!(http.ip, "".to_string());
assert!(http.body.starts_with("<!DOCTYPE html>"));
assert_eq!(http.url, "https://www.rust-lang.org/en-US/".to_string()); // followed redirects (HTTPS)
assert_eq!(http.content_type, "text/html".to_string());

// the parsed HTML info
let html = info.html;

assert_eq!(html.title, Some("The Rust Programming Language".to_string()));
assert_eq!(html.description, Some("A systems programming language that runs blazingly fast, prevents segfaults, and guarantees thread safety.".to_string()));
assert_eq!(html.opengraph.og_type, "website".to_string());

You can also get HTML info about local data:

use webpage::HTML;
let html = HTML::from_file("index.html", None);
// or let html = HTML::from_string(input, None);



If you need to be able to serialize the data provided by the library using serde, you can include specify the serde feature while declaring your dependencies in Cargo.toml:

webpage = { version = "2.0", features = ["serde"] }

No curl dependency

The curl feature is enabled by default but is optional. This is useful if you do not need a HTTP client but already have the HTML data at hand.

All fields

pub struct Webpage {
    pub http: HTTP, // info about the HTTP transfer
    pub html: HTML, // info from the parsed HTML doc

pub struct HTTP {
    pub ip: String,
    pub transfer_time: Duration,
    pub redirect_count: u32,
    pub content_type: String,
    pub response_code: u32,
    pub headers: Vec<String>, // raw headers from final request
    pub url: String, // effective url
    pub body: String,

pub struct HTML {
    pub title: Option<String>,
    pub description: Option<String>,

    pub url: Option<String>, // canonical url
    pub feed: Option<String>, // RSS feed typically

    pub language: Option<String>, // as specified, not detected
    pub text_content: String, // all tags stripped from body
    pub links: Vec<Link>, // all links in the document

    pub meta: HashMap<String, String>, // flattened down list of meta properties

    pub opengraph: Opengraph,
    pub schema_org: Vec<SchemaOrg>,

pub struct Link {
    pub url: String, // resolved url of the link
    pub text: String, // anchor text

pub struct Opengraph {
    pub og_type: String,
    pub properties: HashMap<String, String>,

    pub images: Vec<Object>,
    pub videos: Vec<Object>,
    pub audios: Vec<Object>,

// Facebook's Opengraph structured data
pub struct OpengraphObject {
    pub url: String,
    pub properties: HashMap<String, String>,

// Google's schema.org structured data
pub struct SchemaOrg {
    pub schema_type: String,
    pub value: serde_json::Value,


The following HTTP configurations are available:

pub struct WebpageOptions {
    allow_insecure: false,
    follow_location: true,
    max_redirections: 5,
    timeout: Duration::from_secs(10),
    useragent: "Webpage - Rust crate - https://crates.io/crates/webpage".to_string(),
    headers: vec!["X-My-Header: 1234".to_string()],

// usage
let mut options = WebpageOptions::default();
options.allow_insecure = true;
let info = Webpage::from_url(&url, options).expect("Halp, could not fetch");


~453K SLoC