#snapshot #solana #account #file #tar #zst #load

bin+lib solana-snapshot-etl

Efficiently unpack Solana snapshots

3 releases (breaking)

0.3.0 Jul 7, 2022
0.2.0 Jul 4, 2022
0.1.0 Jun 30, 2022

#6 in #zst

Apache-2.0

78KB
2K SLoC

Solana Snapshot ETL 📸

crates.io docs.rs license

solana-snapshot-etl efficiently extracts all accounts in a snapshot to load them into an external system.

Motivation

Solana nodes periodically backup their account database into a .tar.zst "snapshot" stream. If you run a node yourself, you've probably seen a snapshot file such as this one already:

snapshot-139240745-D17vR2iksG5RoLMfTX7i5NwSsr4VpbybuX1eqzesQfu2.tar.zst

A full snapshot file contains a copy of all accounts at a specific slot state (in this case slot 139240745).

Historical accounts data is relevant to blockchain analytics use-cases and event tracing. Despite archives being readily available, the ecosystem was missing an easy-to-use tool to access snapshot data.

Building

cargo install --git https://github.com/terorie/solana-snapshot-etl --features=standalone --bins

Usage

The ETL tool can extract snapshots from a variety of streaming sources and load them into one of the supported storage backends.

The basic command-line usage is as follows:

USAGE:
    solana-snapshot-etl [OPTIONS] <LOAD_FLAGS> <SOURCE>

Sources

Extract from a local snapshot file:

solana-snapshot-etl /path/to/snapshot-*.tar.zst ...

Extract from an unpacked snapshot:

# Example unarchive command
tar -I zstd -xvf snapshot-*.tar.zst ./unpacked_snapshot/

solana-snapshot-etl ./unpacked_snapshot/

Stream snapshot from HTTP source or S3 bucket:

solana-snapshot-etl 'https://my-solana-node.bdnodes.net/snapshot.tar.zst?auth=xxx' ...

Targets

The fastest way to access snapshot data is the SQLite3 load mechanism.

The resulting SQLite database file can be loaded using any SQLite client library.

solana-snapshot-etl snapshot-139240745-*.tar.zst --sqlite-out snapshot.db

The resulting SQLite database contains the following tables.

  • account
  • token_account (SPL Token Program)
  • token_mint (SPL Token Program)
  • token_multisig (SPL Token Program)
  • token_metadata (MPL Metadata Program)

CSV

Coming soon!

Geyser plugin

Much like solana-validator, this tool can write account updates to Geyser plugins.

solana-snapshot-etl snapshot-139240745-*.tar.zst --geyser plugin-config.json

For more info, consult Solana's docs: https://docs.solana.com/developing/plugins/geyser-plugins

Dependencies

~45–68MB
~1M SLoC