#backup #convert #markdown #convert-html #wget #dont #copy

app wget2hugo

A small utility for converting a wget backup to markdown

3 releases

0.1.2 Oct 28, 2021
0.1.1 Oct 27, 2021
0.1.0 Oct 26, 2021

#6 in #wget

MIT and GPL-3.0+

10KB
116 lines

wget2hugo

This is a program that converts a wget backup of a site into Markdown, which can then be used as content in Hugo or a similar static site generator.

You can create a full backup of a website using

wget \
    --mirror \
    --convert-links \
    $URL

which is great! but if you don't want to merely store that backup or host an exact mirror you'll want to convert that backup to a more manageable format. This program will convert html into Markdown files, and will copy all static files over as well (PDFs, .doc files, images, etc). The goal is to have output which can be immediately popped into a Hugo site's content directory, built, and deployed.

It's written in Rust, using this HTML -> Markdown crate. I wrote a previous version in node.js using turndown, but ran into issues with memory leaks and performance.

Running it

Just do

cargo run --help

and it should print usage information.

Dependencies

~11–20MB
~329K SLoC