Mastering cross-database operations with PostgreSQL FDW
A guide to schema import, archival policies, and automation
Introduction
Modern applications often rely on data scattered across multiple databases. PostgreSQL’s Foreign Data Wrapper (FDW) empowers you to query and manipulate external databases as if they were native PostgreSQL tables. Combined with pg_cron for scheduling and stored procedures for automation, you can build robust archival pipelines.
In this post we will talk about:
Setting up FDW to connect to external databases (ex: another PostgreSQL instance).
Importing foreign schemas.
Creating archival policies with stored procedures
Automating archival tasks with pg_cron
Our sponsor
Multiplayer auto-discovers, tracks, and documents your entire system architecture—from its components to APIs, dependencies, and environments. Gain real-time, comprehensive visibility into your system, all at a glance.
What is PostgreSQL FDW?
FDW is an extension that implements the SQL/MED standard, allowing PostgreSQL to interact with external data sources.
Key benefits:
Unified Query Interface: Run SQL across heterogenous databases (MySQL, MongoDB, Hadoop etc..)
Real-Time data access: No need for ETL pipelines for simple joins.
Simplified Archival: Move historical data to cheaper storage while keeping it queryable.
Foreign Data Wrappers
There is a lot of different FDW extensions that will let you connect to a variety of databases (relational, no-sql, files etc..)
Connects to other PostgreSQL databases (local or remote)
Supports read/write operations
Reads data from flat files (CSV, TSV, etc..)
Connects to MySQL/MariaDB databases
Supports basic queries and joins
Query Redis key-value stores
Access MongoDB collections
Read/write data from Amazon S3
There are many others, which you can find here.
Use cases
As you can imagine, there are many use cases possible.
From reading files from your database, to getting cache keys from Redis and augment them from content stored in your PostgreSQL.
The use case we will focus on, is an automatic archival from PostgreSQL to PostgreSQL. But it would also work from PostgreSQL to Hadoop | MySQL | Bigquery etc..
Setting up FDW and connecting to a foreign database
Step 1: Install required extension
-- Enable FDW
CREATE EXTENSION IF NOT EXISTS postgres_fdw;
Step 2: configure a foreign server (ex: PostgreSQL)
-- Create a foreign server
CREATE SERVER postgres_server FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (host '{FOREIGN_HOST}', dbname '{FOREIGN_DB_NAME}');
-- Map local PostgreSQL user to foreign server credentials
CREATE USER MAPPING FOR postgres SERVER postgres_server
OPTIONS (user '{FOREIGN_USERNAME}', password '{FOREIGN_PASSWORD}');
Step 3: Import foreign schema
You might be tempted to import foreign schema from the other postgres into the “public” schema of your current instance, however you can’t have 2 tables with the same name.
That’s why, I would suggest you create a separate schema, you can call it foreign_schema or whatever you wish:
-- Create a new schema to import the tables into
CREATE SCHEMA foreign_schema;
-- Import all tables from the foreign postgreSQL database
IMPORT FOREIGN SCHEMA public FROM SERVER postgres_server INTO foreign_scehma;
-- You can also import only specific tables
IMPORT FOREIGN SCHEMA public LIMIT TO ({TABLE1}, {TABLE2}) FROM SERVER postgres_server INTO foreign_schema;
Now you can query your foreign database:
SELECT * FROM foreign_schema.{TABLE1};
Building an archival policy
Let’s imagine you have a transactions
table in both your postgres with the same columns (though it’s not mandatory).
The distant server is used as archive and the current db (the one you are connected to) is your live DB.
CREATE TABLE IF NOT EXISTS transactions (
id uuid DEFAULT uuid_generate_v4(),
amount INT DEFAULT 0,
created_at TIMESTAMPTZ DEFAULT NOW(),
PRIMARY KEY (id)
);
Create a stored procedure to move data
CREATE OR REPLACE PROCEDURE archive_old_transactions()
LANGUAGE plpgsql
AS $$
BEGIN
-- Move data older than 1 year to archive
INSERT INTO foreign_schema.transactions
SELECT * FROM public.transactions
WHERE created_at < NOW() - INTERVAL '1 year';
-- Delete archived data from main table
DELETE FROM public.transactions
WHERE created_at < NOW() - INTERVAL '1 year';
END;
$$
;
Automate archival with pg_cron
Step1: Install required extension:
-- Enable pg cron
CREATE EXTENSION IF NOT EXISTS pg_cron;
Step 2: Schedule the stored procedure to run nightly:
-- Run at 2 AM daily
SELECT cron.schedule(
'archive_transactions',
'0 2 * * *',
'CALL archive_old_transactions()'
);
Step 3: Create a view to merge both worlds
CREATE VIEW combined_transactions AS (
WITH remote_data AS (
SELECT * FROM foreign_schema.transactions
),
local_data AS (
SELECT * FROM public.transactions
)
SELECT * FROM remote_data
UNION ALL
SELECT * FROM local_data
);
You will notice that I didn’t simply do a UNION of two tables, I used CTE (Common Table Expressions) because it is crucial for optimizing queries with foreign tables.
Essentially, it containerizes the FDW query, because the query planner will have to ask the foreign database to execute its part, and the clearer this query is, the faster it will be.
Best approach is to:
Containerize the FDW query to reduce data transfer
Filter data at the source
Minimize the returned row count
Indexes & foreign tables
Nothing forces you to have the exact same copy of foreign and local table, neither you have to force foreign keys or equivalent indexes.
It is probably recommended to create specific indexes in your foreign table that will match the query patterns they will be submitted to. Because you don’t query archive data the same way you might query live data.
Materialized Views
The concept of archival is often to reclaim space, but if you were to need faster access to data from both the archive and the live DB then you could use Materialized views to ensure fast queries.
Step 4: Query your view
So if you query transactions from 2 years ago from now, until now. You necessarily have to query both tables.
SELECT * FROM combined_transactions WHERE created_at BETWEEN(NOW() - INTERVAL '2 year', NOW());
Best Practices
Index Foreign Tables: Improve query performance with indexes on frequently filtered columns.
Batch Archival: Use
LIMIT
in procedures to avoid long locks.Error Handling: Wrap pg_cron jobs in
BEGIN...EXCEPTION
blocks.Monitor pg_cron: Use
cron.job_run_details
to track job history.
Conclusion
PostgreSQL FDW transforms your database into a unified gateway for cross-database operations. By combining it with pg_cron and stored procedures, you can automate complex workflows like archival, reporting, and data synchronization without external tools.
Further Reading:
Our sponsor
Multiplayer auto-discovers, tracks, and documents your entire system architecture—from its components to APIs, dependencies, and environments. Gain real-time, comprehensive visibility into your system, all at a glance.