PostgreSQL to CrateDB loader research¶
Full-load¶
This subsystem is already covered by ingestr/dlt.
CDC¶
You can replicate data from PostgreSQL using different methods.
Those replication implementations are relevant.
DMS (logical, plugin: pglogical)
DMS (logical, plugin: wal2json)
asyncpg (logical, native)
pgbelt (logical, plugin: pglogical, uses asyncpg)
pg_replicate (logical, native)
psycopg2 (protocol, native)
pypg-cdc (uses psycopg2, modern pgoutput)
pypgoutput (uses psycopg2)
python-postgres-cdc (uses psycopg2, even more modern pgoutput)
wal2json (logical, plugin)
Evaluation¶
Better use Apache Flink via flink-cdc and the Flink Postgres CDC Connector for protocol replication.
Alternatives¶
Python to the rescue?
Use Kinesis as a data sink.
AWS Blog: Stream changes from Amazon RDS for PostgreSQL using Kinesis & Lambda
Medium: Stream changes from Amazon RDS for PostgreSQL with Kinesis & Lambda
Commerce Architects: Strategies for replicating data from RDS to Kinesis
More resources.
supabase/etl¶
It looks like pg_replicate made progress to become a full-blown ETL framework,
now renamed to supabase/etl
.
That’s a sweet introduction to
pg_replicate
by its author:For the past few months, as part of my job at Supabase, I have been working on pg_replicate. pg_replicate lets you easily build applications which can copy data (full table copies and cdc) from Postgres to any other data system.
– pg_replicate is a Rust crate to build Postgres logical replication applications