#site #scrape #content #post #blogger #cli #archive

app scrape_blogger

A CLI to scrape content from a Blogger Site

2 releases

0.1.3 Nov 21, 2024
0.1.2 Nov 16, 2024

#155 in Compression

Download history 221/week @ 2024-11-16 32/week @ 2024-11-23 12/week @ 2024-11-30 4/week @ 2024-12-07

269 downloads per month

MIT license

23KB
490 lines

scrape_blogger

Usage: scrape_blogger [OPTIONS]

Options:
  -t, --threads <THREADS>  Sets the number of threads to use when scraping all post links [default: 4]
  -r, --recent-only        Scrapes only recent posts from the blog homepage without clicking 'Older Posts'
  -h, --help               Print help
  -V, --version            Print version

Recurisvely crawl and scrape a specific Blogger site in order to archive post content. This project may not generalize well to all Blogger sites. It is hardcoded to work with a specific site, but the source code may be modified to work with any English Blogger site where the site's homepage has a link to older posts.

Dependencies

~15–28MB
~413K SLoC