#postgresql #backup #restore #pg-dump #database-server #pg-restore

elefant-tools

A library for doing things like pg_dump and pg_restore, with extra features, and probably more bugs

2 releases

0.0.2 Jun 17, 2024
0.0.1 Mar 16, 2024

#245 in Database interfaces


Used in elefant-sync

MIT license

610KB
15K SLoC

Elefant Tools

Elefant Tools is a library + binary for interfacing with the PostgreSQL database in the same way as pg_dump and pg_restore. In addition Elefant Tools can do direct copies between two databases without the need to write to disk. This is useful for example when you want to copy a database from one server to another, or when you want to copy a database from a server to a local development environment.

The main difference is that Elefant Tools is written in Rust, is designed to be more flexible than pg_dump and pg_restore, and provides a library for doing full customization and transformation of the structure of the database.

Usage

For most of the commands here you will need to provide credentials and database host information. This general comes in the shape of these arguments. These arguments can also be provided through environment variables, where they are named the same, except in uppercase. For example --source-db-host can be provided through the environment variable SOURCE_DB_HOST.

--source-db-host localhost --source-db-user postgres --source-db-password TopSecretPassword --source-db-name my_db
--target-db-host localhost --target-db-user postgres --target-db-password TopSecretPassword --target-db-name my_target_db

To keep the examples here simple, I will not include these arguments in the examples. You should always include them in your own commands, or provide them through environment variables. If you don't provide them, the commands will fail and tell you which arguments are missing.

Dump to sql file using Postgres insert statements.

This file can be passed to either psql or elefant-sync to import the data again:

# Dump to file
elefant-sync export sql-file --path my_dump.sql --format InsertStatements

# Import from file
elefant-sync import sql-file --path my_dump.sql
# or
psql --dbname my_target_db -f my_dump.sql

Dump to sql file using Postgres copy statements.

This requires using elefant-sync to import the data again, but is faster:

# Dump to file
elefant-sync export sql-file --path my_dump.sql --format CopyStatements

# Import from file
elefant-sync import sql-file --path my_dump.sql

Copy between two databases without temporary files

This was one of the main original use cases for this tool. It allows you to copy a database from one server to another without writing to disk. This is useful when you have a large database and aren't sure if you have enough disk space. Also it's faster than dump + restore, so there's that.

elefant-sync copy --source-db-name my_source_db --target-db-name my_target_db

While this tool doesn't support running everything in one transaction, it does support "differential copying". This means that if the copy fails, you can just run the same command again, and it will continue where it left off, which is often times what you are actually trying to achieve with the full transaction:

elefant-sync copy --source-db-name my_source_db --target-db-name my_target_db --differential

The --help command

I would very much recommend checking out the --help command for each of the commands to see all the options available, for example:

elefant-sync.exe --help
A replacement for db_dump and db_restore that supports advanced processing such as moving between schemas.

This tool is currently experimental and any use in production is purely on the user. Backups are recommended.

Usage: elefant-sync.exe [OPTIONS] <COMMAND>

Commands:
  export  Export a database schema to a file or directory to be imported later on
  import  Import a database schema from a file or directory that was made using the export command
  copy    Copy a database schema from one database to another
  help    Print this message or the help of the given subcommand(s)

Options:
      --max-parallelism <MAX_PARALLELISM>
          How many threads to use when exporting or importing. Defaults to the number of estimated cores on the machine. If the available parallelism cannot be determined, it defaults to 1

          [env: MAX_PARALLELISM=]
          [default: 32]

  -h, --help
          Print help (see a summary with '-h')

  -V, --version
          Print version

Supported features:

Not all Postgres features are currently supported. The following is a list of features that are supported, partially supported, and not supported. If you find a feature that is not supported, please open an issue. If you can, providing the actual code to support the feature would be very helpful.

✅ Supported
❌ Not supported
➕ Partially supported (Check the nested items)
✅ Primary keys
✅ Sequences
    ❌ owned by
✅ Foreign keys
    ✅ Update/Delete cascade rules
✅ Not null constraints
✅ Check constraints
    ✅ Check constraints calling functions
✅ Unique constraints
    ✅ Distinct nulls
    ✅ Using explicit unique index
✅ Indexes
    ✅ Column direction
    ✅ Nulls first/last
    ✅ Index type
    ✅ Filtered index
    ✅ Expressions
    ✅ Included columns
    ✅ Index storage parameters
✅ Generated columns
❌ Row level security
✅ Triggers
✅ Views
✅ Materialized views
➕ Functions/Stored procedures
    ❌ Transforms
    ✅ Aggregate functions
✅ Extensions
➕ Comments (Best effort to support comments on objects. If I have forgotten to support comments on any object, please open an issue.)
✅ Partitions
✅ Inheritance
✅ Enums
❌ Collations
✅ Schemas
❌ Roles/Users
✅ Default values
✅ Quoted identifier names
✅ Array columns
❌ Exclusion constraints
✅ Domains

Timescale DB support

✅ Hypertables
    ✅ Space partitioning
    ✅ Time partitioning
    ✅ Multiple dimensions
    ✅ Compression
    ❌ Distributed hypertables
✅ Continuous aggregates
    ✅ Compression
✅ Retention policies
✅ User defined actions/jobs

Do note: This uses high level features in Timescale, and doesn't operate directly on the underlying chunks/catalog tables from timescale, unlike pg_dump. This means data might be chunked slightly different from the original database, but the data should be the same. For example if you have changed the chunk interval at some point and have chunks with different sizes, they will all have the same size in the dump.

Continuous aggregates are recreated, which means data that was no longer in the original table will also be missing from the continuous aggregate.

Dependencies

~10–21MB
~333K SLoC