10 releases (2 stable)
1.0.1 | Sep 7, 2024 |
---|---|
0.6.1 | Mar 27, 2024 |
0.2.0 | Nov 8, 2023 |
#403 in Network programming
300KB
6K
SLoC
rpz
rpz
consists of a binary crate and library crate.
The binary crate, rpz
, is an application that downloads, parses, and transforms ad-(un)block files from
URLs and local file paths into a response policy zone (RPZ)
file. This RPZ file can be consumed by a DNS server that supports such files
(e.g., Unbound).
rpz in action
In this example it is assumed unbound.conf(5)
is properly configured
and has name
and zonefile
in the rpz
section set to .
and /var/unbound/db/rpz
respectively in addition to control-enable
set to true
in the remote-control
section.
[zack@laptop ~]$ cat<<EOF>/usr/local/etc/rpz/config
> timeout = 15
> rpz = "/var/unbound/db/rpz"
> local_dir = "/usr/local/etc/rpz/"
> adblock = [
"https://raw.githubusercontent.com/AdguardTeam/AdguardFilters/master/BaseFilter/sections/adservers.txt",
"https://raw.githubusercontent.com/AdguardTeam/AdguardFilters/master/BaseFilter/sections/adservers_firstparty.txt",
"https://raw.githubusercontent.com/AdguardTeam/AdguardFilters/master/MobileFilter/sections/adservers.txt",
"https://raw.githubusercontent.com/AdguardTeam/AdguardFilters/master/SpywareFilter/sections/mobile.txt",
"https://raw.githubusercontent.com/AdguardTeam/AdguardFilters/master/SpywareFilter/sections/tracking_servers.txt",
"https://raw.githubusercontent.com/AdguardTeam/AdguardFilters/master/SpywareFilter/sections/tracking_servers_firstparty.txt",
"https://raw.githubusercontent.com/easylist/easylist/master/easylist/easylist_adservers.txt",
"https://raw.githubusercontent.com/easylist/easylist/master/easylist/easylist_thirdparty.txt",
"https://raw.githubusercontent.com/easylist/easylist/master/easyprivacy/easyprivacy_thirdparty.txt",
"https://raw.githubusercontent.com/easylist/easylist/master/easyprivacy/easyprivacy_trackingservers.txt",
"https://malware-filter.gitlab.io/malware-filter/urlhaus-filter-agh.txt"
]
domain = ["https://www.stopforumspam.com/downloads/toxic_domains_whole.txt"]
hosts = ["https://raw.githubusercontent.com/AdAway/adaway.github.io/master/hosts.txt", "https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts"]
wildcard = ["https://pgl.yoyo.org/adservers/serverlist.php?hostformat=adblock&showintro=0&mimetype=plaintext"]
> EOF
[zack@laptop ~]$ cat /usr/local/etc/rpz/unblock/domain/unbound
dpm.demdex.net # ESPN app on PS5 needs this.
[zack@laptop ~]$ rpz -f /usr/local/etc/rpz/config
unblock count written: 1
block count written: 271559
total lines written: 271560
domains parsed: 254147
comments parsed: 6629
blanks parsed: 4519
parsing errors: 24624
[zack@laptop ~]$ head -1 /var/unbound/db/rpz
dpm.demdex.net CNAME rpz-passthru.
[zack@laptop ~]$ tail -6 /var/unbound/db/rpz
stats.zone-telechargement CNAME .
*.stats.zone-telechargement CNAME .
5wh.co.zw CNAME .
www.5wh.co.zw CNAME .
pandi.co.zw CNAME .
www.pandi.co.zw CNAME .
[zack@laptop ~]$ unbound-control -q auth_zone_reload . && unbound-control -q flush_zone . && unbound-control -q flush_negative
Ad-(un)block file format and encoding
All ad-(un)block files must be valid UTF-8; however for a given domain, each label must only contain 1–63 Unicode scalar values from the set:
!
, $
, &
, '
, (
, )
, +
, ,
, -
, 0
–9
, ;
, =
, _
, `
, A
–Z
, a
–z
, {
, }
, and ~
. Labels must be delimited
by .
. Domains in the file must be delimited by a line feed or carriage return and line feed. A domain must be less than 254 characters in length
including the .
label separator. Domains are treated as case-insensitive with uppercase letters treated as lowercase. Domains must not be an
IPv4 address.
Adblock-style
Domain constructed from an Adblock-style rule with the requirement that the rule conforms to the following extended regex:
^<ws>*(\|\|)?<ws>*<domain><ws>*\^?<ws>*$
where <domain>
conforms to a valid Domain
based on
ASCII_FIREFOX
with the added requirements
that the TLD is either all letters or at least length five and begins with xn--
and does not contain $
, and <ws>
is any sequence of ASCII whitespace.
Lines that begin with ||
cause all subdomains to be blocked (i.e., the domain itself and all proper subdomains); without
||
, only the specific domain is blocked.
Due to the conservative nature in how these files are processed, one is encouraged to still use an application-level
ad blocker (e.g., uBlock Origin). Adblock-style files often contain paths as well as
additional information (e.g., “third-party”) that require application-level information to process correctly as such
entries will be considered “parsing errors” by rpz
.
Domain-style
Domain constructed from a domains-only rule with the requirement that the rule conforms to the following regex:
^<ws>*<domain><ws>*(#.*)?$
where <domain>
conforms to a valid Domain
based on ASCII_FIREFOX
, the TLD is either all letters or at least length five and begins with xn--
, and <ws>
is any sequence of ASCII whitespace.
Domains only represent themselves (i.e., proper subdomains will not be blocked).
Hosts-style
Domain constructed from a hosts(5)
-style rule
with the requirement that the rule conforms to the following extended regex:
^<ws>*<ip><ws>+<domain><ws>*(#.*)?$
where <domain>
conforms to a valid Domain
based on ASCII_FIREFOX
, the TLD is either all letters or at least length five and begins with xn--
, <ws>
is any sequence of ASCII whitespace, and <ip>
is one of the following:
::
, ::1
, 0.0.0.0
, or 127.0.0.1
.
Domains only represent themselves (i.e., proper subdomains will not be blocked).
Wildcard-style
Domain constructed from a wildcard domain rule with the requirement that the rule conforms to the following extended regex:
^<ws>*(\*\.)?<domain><ws>*(#.*)?$
where <domain>
conforms to a valid Domain
based on ASCII_FIREFOX
, the TLD is either all letters or at least length five and begins with xn--
, and <ws>
is any sequence of ASCII whitespace.
If domain
begins with *.
, then domain
must have length less than 252 and all proper subdomains are blocked—this
does not include the domain itself; otherwise, only the domain
is blocked.
Config file
Either -
or the absolute path to the TOML config file must be passed via the -f
/--file
CLI option. If -
is passed, then stdin
will be read. The
format of this file must conform to the following:
timeout = <timeout_in_seconds>
rpz = <absolute_file_path_to_the_RPZ_file_to_be_written>
local_dir = <absolute_file_path_to_the_directory_containing_local_files>
adblock = [<HTTP(S)_URLs>]
domain = [<HTTP(S)_URLs>]
hosts = [<HTTP(S)_URLs>]
wildcard = [<HTTP(S)_URLs>]
If rpz
does not exist, then the file will be written to stdout
. If local_dir
is specified, block/
and unblock/
subdirectories are searched; and for each of those subdirectories,
adblock/
, domain/
, hosts/
, and wildcard/
subdirectories are searched for files which are parsed according to the directory they are in. It is not
an error if any of the directories do not exist.
In the event keys are specified corresponding to arrays, URLs must be unique across all arrays. The files these URLs point to are interpreted as block files (i.e., unblock files are only allowed on the local file system).
The timeout
corresponds to the maximum seconds allowed for an HTTP(S) file to be downloaded.
If it does not exist or has a value of 0, then a timeout of one hour will be used. If the value specified exceeds one hour,
then it will be truncated to one hour.
RPZ file
Unless stdout
is the destination, a temporary RPZ file is written in the same location as the rpz
value in the config file except with tmp
appended to the name. Upon success, this file
is renamed to the rpz
value in the config file. The contents of this file contain the minimum number of lines possible with unblock entries taking precedence
over block entries.
In the event there are no block entries or the temp file already exists, the program will abort.
Options
When rpz
is passed -V
/--version
, the version of rpz
will be printed to stdout
. When passed -h
/--help
,
information about the program and its options will be printed to stdout
. When passed -f
/--file
along with
-
or the absolute path to the TOML config file, rpz
will run normally printing summary information to stdout
upon completion. One can additionally pass -q
/--quiet
along with -f
/--file
in order to suppress summary
information from being printed to stdout
. When -v
/--verbose
is passed along with -f
/--file
, in addition to
the normal summary information being printed to stdout
, itemized summary information for each input file
including the kinds of errors and counts of errors will be printed to stdout
.
Example
If www.example.com
, *.example.com
, and foo.com
are to be blocked while foo.example.com
and ||foo.com
are to be unblocked, the RPZ file would look like the following:
foo.example.com CNAME rpz-passthru.
*.example.com CNAME .
Upon success, the quantity of unblock, block, and total lines written is written to stdout
in addition
to the total number of domains, comments, blanks, and parsing errors.
Errors
Parsing errors are ignored; all other errors are written to stderr
before program abortion.
License
Licensed under either of
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0).
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT).
at your option.
Contribution
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.
Status
This package is actively maintained.
The crates are only tested on the x86_64-unknown-linux-gnu
and x86_64-unknown-openbsd
targets, but
they should work on platform.
Nightly rustc
is required. Once BTreeMap
cursors are stabilized, stable rustc
will work.
On OpenBSD-stable, one can use the rust
port as long as RUSTC_BOOTSTRAP
is export
ed with a value of 1
before invoking
cargo build --all-features --release
or cargo install --all-features rpz
.
Dependencies
~9–21MB
~314K SLoC