Postgres

PostgresトランザクションIDラップアラウンド:最悪の土曜まで4時間

pg_stat_activityが「autovacuum (to prevent wraparound)」を表示し書き込みが止まり始めたら、正しく対処するための4時間。間違えればDBはread-onlyに。本物のランブック。

Published 2026-03-20·10 min read

Saturday morning. pg_stat_activity shows autovacuum: VACUUM (to prevent wraparound) on your largest table. Application writes slow down. Then the throughput craters. If you do nothing, in a few hours your DB stops accepting writes entirely. If you cancel the vacuum, transaction-ID wraparound happens and you have a corruption risk. This is the runbook.

On this page

What's actually happening
Early warning that buys you weeks
Already in aggressive mode
After the fire
FAQ

What’s actually happening

Postgres uses a 32-bit transaction ID. Every row stores the txid that wrote it. To know whether a row is visible to your transaction, Postgres compares txids. Once txids approach 2^31, the math wraps and old rows could appear newer than they are — a data-corruption hazard.

Postgres prevents this by “freezing” old rows: marking them with a special sentinel that means “older than any current txid.” Vacuum freezes pages as it goes. When a table’s oldest unfrozen txid hits autovacuum_freeze_max_age (default 200M), Postgres triggers an aggressive autovacuum that holds an exclusive lock more often and runs faster — but it’s still rate-limited by autovacuum_vacuum_cost_limit.

Early warning that buys you weeks

-- Tables nearest to wraparound — alert when any value > 800M
SELECT
  c.relnamespace::regnamespace || '.' || c.relname AS table,
  age(c.relfrozenxid) AS xid_age,
  pg_size_pretty(pg_total_relation_size(c.oid)) AS size
FROM pg_class c
WHERE c.relkind IN ('r', 't', 'm')
ORDER BY age(c.relfrozenxid) DESC
LIMIT 20;

-- Cluster-wide datfrozenxid
SELECT datname, age(datfrozenxid) AS xid_age
FROM pg_database
ORDER BY xid_age DESC;

Already in aggressive mode

Don’t cancel the vacuum. If you do, autovacuum will restart it on the same table; you’ve made no progress and burned IO.
Raise autovacuum_vacuum_cost_limit aggressively. Default 200 → 5000+. The vacuum will use more IO but finish faster.
Reduce autovacuum_vacuum_cost_delay to 0 if you’re comfortable with the IO load.
Pause non-essential writes if possible — vacuum competes with writers for buffer locks.
Don’t restart the server. Restarts force a recovery scan that doesn’t advance frozen state.

-- Apply temporarily; revert after the fire is out
ALTER SYSTEM SET autovacuum_vacuum_cost_limit = 10000;
ALTER SYSTEM SET autovacuum_vacuum_cost_delay = 0;
SELECT pg_reload_conf();

-- Watch progress
SELECT * FROM pg_stat_progress_vacuum;

After the fire is out

Lower autovacuum_freeze_max_age on hot tables (per-table override) so freeze runs more often, less aggressively.
Raise autovacuum_max_workers if you have CPU/IO budget — multiple tables can freeze in parallel.
Add the relfrozenxid alert above to your monitoring stack permanently.
Audit any process that resets statistics (pg_stat_reset) — aggressive resets can hide vacuum progress.

FAQ

Does pg_repack help here?+

Indirectly. pg_repack creates a new physical relation, which means new pages with current frozen txids. Useful as a follow-up after the immediate fire is out, not during.

Can I just upgrade Postgres?+

Pg 14+ has 64-bit txids in some operations and improved vacuum, but 32-bit txid for visibility checks remains. Upgrade for many reasons; wraparound prevention isn't fully solved.

How does this differ on Aurora / managed Postgres?+

Same mechanism. RDS / Aurora exposes the same parameters; tune them the same way. The main difference: cancel scope is more limited on managed services.

Keep reading

Postgres

Postgres bloat and autovacuum: a 2026 tuning guide

What table and index bloat actually costs you, how autovacuum works in 16+, the parameters that matter, and the queries to find your worst offenders before they trigger an OOM.

Postgres

Postgres slow queries: 12 causes and how to find each one

A field-tested playbook for diagnosing a slow Postgres query in production — from missing indexes to plan flips to bloated tables — with the SQL to find each cause and the fix.

← All posts

PostgresトランザクションIDラップアラウンド:最悪の土曜まで4時間

What’s actually happening

Early warning that buys you weeks

Already in aggressive mode

After the fire is out

FAQ

サービスと同じようにデータベースを監視する。