2 releases

0.1.1 Mar 31, 2024
0.1.0 Mar 30, 2024

#832 in Web programming

MIT license

27KB
592 lines

Roboto: Parse and use robots.txt files

Roboto provides a type-safe way to parse and use robots.txt files. It is based on the Robots Exclusion Protocol and is used to approximately try control the behavior of web crawlers and other web robots.

Installation

Add this to your Cargo.toml:

[dependencies]
roboto = "0.1"

Usage

use roboto::Robots;

let robots = r#"
User-agent: *
Disallow: /private
Disallow: /tmp
"#.parse::<Robots>().unwrap();

let user_agent = "googlebot".parse().unwrap();

assert_eq!(robots.is_allowed(&user_agent, "/public"), true);

lib.rs:

Parsing and applying robots.txt files.

Examples

use roboto::Robots;

let robots = r#"
User-agent: *
Disallow: /
"#.parse::<Robots>().unwrap();

assert!(!robots.is_allowed(&"googlebot".parse().unwrap(), "/"));
assert!(robots.is_allowed(&"googlebot".parse().unwrap(), "/robots.txt"));
assert!(!robots.is_allowed(&"googlebot".parse().unwrap(), "/foo/bar"));

References

Dependencies

~120KB