Mastering cross-database operations with PostgreSQL FDW

A guide to schema import, archival policies, and automation

Feb 08, 2025

Introduction

Modern applications often rely on data scattered across multiple databases. PostgreSQL’s Foreign Data Wrapper (FDW) empowers you to query and manipulate external databases as if they were native PostgreSQL tables. Combined with pg_cron for scheduling and stored procedures for automation, you can build robust archival pipelines.

In this post we will talk about:

Setting up FDW to connect to external databases (ex: another PostgreSQL instance).
Importing foreign schemas.
Creating archival policies with stored procedures
Automating archival tasks with pg_cron

What is PostgreSQL FDW?

FDW is an extension that implements the SQL/MED standard, allowing PostgreSQL to interact with external data sources.

Key benefits:

Unified Query Interface: Run SQL across heterogenous databases (MySQL, MongoDB, Hadoop etc..)
Real-Time data access: No need for ETL pipelines for simple joins.
Simplified Archival: Move historical data to cheaper storage while keeping it queryable.

Foreign Data Wrappers

There is a lot of different FDW extensions that will let you connect to a variety of databases (relational, no-sql, files etc..)

postgres_fdw
- Connects to other PostgreSQL databases (local or remote)
- Supports read/write operations
file_fdw
- Reads data from flat files (CSV, TSV, etc..)
mysql_fdw
- Connects to MySQL/MariaDB databases
- Supports basic queries and joins
redis_fdw
- Query Redis key-value stores
mongo_fdw
- Access MongoDB collections
aws_s3_fdw
- Read/write data from Amazon S3

There are many others, which you can find here.

Use cases

As you can imagine, there are many use cases possible.

From reading files from your database, to getting cache keys from Redis and augment them from content stored in your PostgreSQL.

The use case we will focus on, is an automatic archival from PostgreSQL to PostgreSQL. But it would also work from PostgreSQL to Hadoop | MySQL | Bigquery etc..

Setting up FDW and connecting to a foreign database

Step 1: Install required extension

-- Enable FDW   
CREATE EXTENSION IF NOT EXISTS postgres_fdw;

Step 2: configure a foreign server (ex: PostgreSQL)

-- Create a foreign server
CREATE SERVER postgres_server FOREIGN DATA WRAPPER postgres_fdw
   OPTIONS (host '{FOREIGN_HOST}', dbname '{FOREIGN_DB_NAME}');

-- Map local PostgreSQL user to foreign server credentials
CREATE USER MAPPING FOR postgres SERVER postgres_server
   OPTIONS (user '{FOREIGN_USERNAME}', password '{FOREIGN_PASSWORD}');

Step 3: Import foreign schema

You might be tempted to import foreign schema from the other postgres into the “public” schema of your current instance, however you can’t have 2 tables with the same name.

That’s why, I would suggest you create a separate schema, you can call it foreign_schema or whatever you wish:

-- Create a new schema to import the tables into
CREATE SCHEMA foreign_schema;

-- Import all tables from the foreign postgreSQL database
IMPORT FOREIGN SCHEMA public FROM SERVER postgres_server INTO foreign_scehma;

-- You can also import only specific tables
IMPORT FOREIGN SCHEMA public LIMIT TO ({TABLE1}, {TABLE2}) FROM SERVER postgres_server INTO foreign_schema;

Now you can query your foreign database:

SELECT * FROM foreign_schema.{TABLE1};

Building an archival policy

Let’s imagine you have a transactions table in both your postgres with the same columns (though it’s not mandatory).

The distant server is used as archive and the current db (the one you are connected to) is your live DB.

CREATE TABLE IF NOT EXISTS transactions (
    id uuid DEFAULT uuid_generate_v4(),
    amount INT DEFAULT 0,
    created_at   TIMESTAMPTZ DEFAULT NOW(),

    PRIMARY KEY (id)
);

Create a stored procedure to move data

CREATE OR REPLACE PROCEDURE archive_old_transactions()  
LANGUAGE plpgsql  
AS $$  
BEGIN  
  -- Move data older than 1 year to archive  
  INSERT INTO foreign_schema.transactions  
  SELECT * FROM public.transactions  
  WHERE created_at < NOW() - INTERVAL '1 year';  

  -- Delete archived data from main table  
  DELETE FROM public.transactions  
  WHERE created_at < NOW() - INTERVAL '1 year';  
END;  
$$
;

Automate archival with pg_cron

Step1: Install required extension:

-- Enable pg cron   
CREATE EXTENSION IF NOT EXISTS pg_cron;

Step 2: Schedule the stored procedure to run nightly:

-- Run at 2 AM daily
SELECT cron.schedule(
  'archive_transactions',
  '0 2 * * *',
  'CALL archive_old_transactions()'
);

Step 3: Create a view to merge both worlds

CREATE VIEW combined_transactions AS (
    WITH remote_data AS (
      SELECT * FROM foreign_schema.transactions
    ),
    local_data AS (
      SELECT * FROM public.transactions
    )
    SELECT * FROM remote_data
    UNION ALL
    SELECT * FROM local_data
);

You will notice that I didn’t simply do a UNION of two tables, I used CTE (Common Table Expressions) because it is crucial for optimizing queries with foreign tables.

Essentially, it containerizes the FDW query, because the query planner will have to ask the foreign database to execute its part, and the clearer this query is, the faster it will be.

Best approach is to:

Containerize the FDW query to reduce data transfer
Filter data at the source
Minimize the returned row count

Indexes & foreign tables

Nothing forces you to have the exact same copy of foreign and local table, neither you have to force foreign keys or equivalent indexes.

It is probably recommended to create specific indexes in your foreign table that will match the query patterns they will be submitted to. Because you don’t query archive data the same way you might query live data.

Materialized Views

The concept of archival is often to reclaim space, but if you were to need faster access to data from both the archive and the live DB then you could use Materialized views to ensure fast queries.

Step 4: Query your view

So if you query transactions from 2 years ago from now, until now. You necessarily have to query both tables.

SELECT * FROM combined_transactions WHERE created_at BETWEEN(NOW() - INTERVAL '2 year', NOW());

Best Practices

Index Foreign Tables: Improve query performance with indexes on frequently filtered columns.
Batch Archival: Use LIMIT in procedures to avoid long locks.
Error Handling: Wrap pg_cron jobs in BEGIN...EXCEPTION blocks.
Monitor pg_cron: Use cron.job_run_details to track job history.

Conclusion

PostgreSQL FDW transforms your database into a unified gateway for cross-database operations. By combining it with pg_cron and stored procedures, you can automate complex workflows like archival, reporting, and data synchronization without external tools.