English for Postgres Replication

Learn the English vocabulary for PostgreSQL replication: streaming replication, replication lag, and failover, explained for discussing database availability clearly.

“The replica is behind” can mean a few seconds of normal lag or a broken replication slot that’s about to fill the primary’s disk — Postgres replication has a specific vocabulary for describing exactly how far behind, and why, that’s worth using precisely in an incident.

Key Vocabulary

Streaming replication — the mechanism where a replica continuously receives and applies the write-ahead log (WAL) from the primary over a network connection, keeping it near real-time up to date. “We use streaming replication to keep the read replica within a few hundred milliseconds of the primary under normal load.”

Replication lag — the delay between a write committing on the primary and that same write becoming visible on a replica, measured in either time or bytes of WAL not yet applied. “Replication lag spiked to four minutes during the bulk import — reads against the replica were serving stale data for that window.”

Replication slot — a primary-side bookmark that ensures WAL segments aren’t recycled until a specific replica has consumed them, preventing data loss but risking disk exhaustion if the replica stops consuming. “The replication slot kept accumulating WAL because the replica had been disconnected for two days — the primary’s disk usage kept climbing until we dropped the slot.”

Failover — promoting a replica to become the new primary after the original primary becomes unavailable, either manually triggered or automated through a failover tool. “We triggered a manual failover after confirming the primary was unreachable, promoting the replica with the least lag to take writes.”

Synchronous replication — a replication mode where the primary waits for a replica to confirm a write before acknowledging it to the client, trading write latency for a stronger durability guarantee. “We run one replica in synchronous mode so a committed write is guaranteed to survive even if the primary fails immediately after — the trade-off is added write latency.”

Common Phrases

  • “What’s the current replication lag on that replica?”
  • “Is this replication slot still being consumed, or is it orphaned?”
  • “Are we running synchronous or asynchronous replication on this replica?”
  • “We need to failover — is the replica caught up enough to take writes?”
  • “Is the lag growing because of network throughput, or because the replica can’t apply WAL fast enough?”

Example Sentences

Diagnosing a growing disk-usage alert: “The primary’s disk is filling up because a replication slot for a decommissioned replica was never dropped — WAL has been accumulating unconsumed since we retired that replica last week.”

Explaining a stale-read incident: “Users were seeing outdated order statuses because replication lag briefly exceeded thirty seconds during the migration, and our read traffic wasn’t routed back to the primary during that window.”

Describing a failover decision during an incident: “We’re failing over to the replica in us-east because its replication lag was under one second at the time of the primary’s failure, versus twelve seconds on the other replica.”

Professional Tips

  • Quote replication lag as a specific number (seconds or bytes) in incident updates, not just “the replica is behind” — the specific figure tells stakeholders how stale reads currently are.
  • Monitor replication slots for orphaned entries proactively — an unconsumed slot silently growing the primary’s disk is one of the more common self-inflicted outages.
  • State explicitly whether replication is synchronous or asynchronous when discussing durability guarantees — the answer changes what “the write is safe” actually means.
  • During a failover decision, name the lag on each replica candidate before promoting one — promoting the most-lagged replica risks losing recently committed writes.

Practice Exercise

  1. Write a sentence describing what a replication slot protects against and what risk it introduces.
  2. Explain the trade-off between synchronous and asynchronous replication.
  3. Describe how you’d choose which replica to promote during a failover.