#discord #scraper #bot #json-format #cli #discordscraper

bin+lib discord_rust_scraper

DiscordRustScraper is a powerful Discord data scraper built in Rust, designed to extract and format channel data for further analysis. It efficiently scrapes message history from specified channels and outputs it in a clean JSON format for easy processing.

4 stable releases

new 1.0.4 Mar 18, 2025
1.0.3 Mar 17, 2025
1.0.2 Mar 10, 2025

#924 in Command line utilities

Download history 199/week @ 2025-03-10

199 downloads per month

MIT license

28KB
565 lines

DiscordRustScraper

Rust-Scraper-Bannerwide.png

Crates.io Downloads


Description

DiscordRustScraper is a powerful Discord data scraper built in Rust, designed to extract and format channel data for further analysis. It efficiently scrapes message history from specified channels and outputs it in a clean JSON format for easy processing.

Commands & Usage

Scrape

  • Usage : cargo run -- scrape --bot_token <BOT_TOKEN> --channel_ids [CHANNEL_IDS]
  • Example : cargo run -- scrape --bot_token "your_bot_token" --channel_ids 659069446438125570 806378740917469234

convert-to-json

  • Usage: cargo run -- convert-to-json <INPUT_FILE>
  • Example: cargo run -- convert-to-json on-topic.jsonl

sql (optional)

The SQL argument provides an optional feature that enables the use of a SQL database to store messages instead of the default storage method, by passing through a connection string. This is a more efficient way of storing data compared to JSONs.

  • Usage : cargo run -- scrape --bot_token <BOT_TOKEN> --channel_ids [CHANNEL_IDS] --sql <CONNECTION_STRING>
  • Example : cargo run -- scrape --bot_token "your_bot_token" --channel_ids 659069446438125570 806378740917469234 --sql mysql://username:password@127.0.0.1:3306/database
Schema

You'll have to create the database yourself so i've attached the schema below.

CREATE TABLE messages (
    channel_id BIGINT UNSIGNED NOT NULL,
    author_id BIGINT UNSIGNED NOT NULL,
    message_id BIGINT UNSIGNED NOT NULL,
    message TEXT NOT NULL,
    has_media BOOLEAN NOT NULL,
    PRIMARY KEY (message_id)
);

Dependencies

~30–44MB
~810K SLoC