#link-checker #web #cli

app broken-links

Find all broken links from a starting url

9 releases

0.2.3 Jul 3, 2022
0.2.2 Jan 11, 2022
0.1.4 Jan 5, 2022

#2344 in Command line utilities

26 downloads per month

GPL-3.0 license

20KB
390 lines

broken-links is a tool to help you identify broken links on a website. Provide a URL to start the check, and it will find and check all href links (excluding anchors) on that page. For any links within the same domain, it will then visit them and collect and check subsequent links, and then do the same with those links, and so on until it has exhausted all the links it finds within the provided domain. It attempts to avoid checking the same link twice, though this may sometimes happen. Note that it will check external links, but will not then visit them to find additional links to check. In any case, you probably don't want to run this on google.com or some website that you don't control.

Install:

cargo install broken-links

Check all links found at https://kdwarn.dev and all the links found on any pages within that domain linked to from that starting url, and then subsequent pages, and so on:

broken-links https://kdwarn.dev

Tell the program to not check a url (or multiple urls). This is particularly useful if there are some auto-generated links, for instance in a calendar system with "back" and "forward" links. If you want to skip more than one, separate them with a comma. Be sure to use the full url (i.e. include https://):

broken-links https://kdwarn.dev -s https://kdwarn.dev/welcome/now,https://kdwarn.dev/nothing

Only interested in links at a single url or just want to give it a quick spin? Use the quick (-q or --quick) flag:

broken-links https://kdwarn.dev -q

Once the program finishes (a preliminary, rough estimate is that it checks about 25,000 unique links per hour, which will vary depending on the number of non-existent domains in links), it will display the number of links checked, the total verified, and the duration of the check. (The difference between "checked" and "total verified" is essentially the number of links that appear multiple times. "Checked links" is close to being unique, but some duplicates will sneak through. "Total verified" should include all duplicates.) If broken links are found, they will be saved to a CSV file. Similarly, any errors will be saved to another CSV. The CSVs will be placed into the directory from which the CLI is run, and will be timestamped.

See full help:

broken-links --help

Dependencies

~9–23MB
~359K SLoC