#sensitive #transformer #bucket #column #table #reverse #deidentify

app joindoe

Utility to deidentify sensitive data

1 unstable release

0.1.0 Jul 22, 2022

#24 in #transformer

Apache-2.0

23KB
461 lines

Join Doe

Join Doe is a tool for replicating database contents between environments while deidentifying sensitive data.

It dumps the source data to an S3 bucket, deidentify it and uploads it to the destination.

Current status

Curerntly the project only works with Redshift.

How to use

Join Doe executes its jobs from a YAML config file.

Example:

source:
  connection_uri: $DATABASE_URL
  tables:
    - name: providers
      transform:
          - column: identifier
            transformer: reverse
          - column: first_name
            transformer: first-name
          - column: last_name
            transformer: last-name
    - name: orders
      transform:
          - column: identifier
            transformer: reverse
store:
  bucket: nw-data-transfer
  aws_access_key_id: $AWS_ACCESS_KEY_ID
  aws_secret_access_key: $AWS_SECRET_ACCESS_KEY
destination:
  connection_uri: $TARGET_DATABASE_URL

This config processes two tables from the source database: providers and orders. It then modifies a couple of fields using a given transformer, stores it on an S3 bucket and then uploads it to the destination database.

The supported transformers are:

  • reverse: reverses the contents of the field
  • first-name: replaces the contents of the field by a random first name
  • last-name: replaces the contents of the field by a random last name

Dependencies

~17–30MB
~481K SLoC