Distributed Clocks
5 exercises — master distributed clock vocabulary: why wall clocks fail for ordering, Lamport timestamps and their limits, vector clock causality detection, Hybrid Logical Clocks (HLC) in CockroachDB and YugabyteDB, and Google Spanner's TrueTime with commit wait.
0 / 5 completed
Distributed clock quick reference
- Wall clocks (NTP) — ±1-100ms typical uncertainty; clock drift between syncs; leap second issues; do NOT use for event ordering.
- Happens-before (→) — A→B means A causally precedes B; captured by logical clocks.
- Lamport timestamp — counter incremented before each event; on receive: max(local, received)+1. If A→B then L(A) < L(B). Reverse NOT guaranteed — cannot detect concurrency.
- Vector clock — vector of counters (one per node). Can detect concurrent events: incomparable vectors = concurrent. O(n) space cost.
- HLC — (physical_time, logical_counter). Stays within ε of NTP; Lamport-style causal ordering. Used in CockroachDB, YugabyteDB.
- TrueTime (Spanner) — GPS + atomic clocks; returns interval [earliest, latest]; ε ≈ 7ms.
- Commit wait — Spanner waits until TT.now().earliest > commit_timestamp before acknowledging; ensures external consistency.
1 / 5
A junior engineer asks: "Why can't we just use the system timestamp (Date.now()) on each server to determine the order of events in a distributed system?"
Wall clock synchronisation limitations make timestamps unsuitable for strict event ordering in distributed systems.
Sources of clock inaccuracy:
• NTP synchronisation error — typical LAN: ±1-5ms. WAN: ±50-100ms. GPS-disciplined: ±1μs but expensive
• Clock drift — quartz oscillators drift ~1-10ms/day between NTP syncs. A 30-second sync interval at 10ms/day = ~350μs/interval drift
• Leap seconds — UTC adds a second when Earth's rotation slows; OS handling varies: clock freeze (Linux historical), smear (Google), jump; all cause problems
• VM clock skew — VM clocks are emulated; during host overload or live migration, guest clocks may fall behind by seconds
• Network jitter — NTP corrections themselves are imprecise because the correction is based on round-trip time divided by 2, assuming symmetric latency
The Einstein insight for distributed systems:
In special relativity, there is no universal "now" for observers in different locations. Similarly, in a distributed system, there is no globally agreed "current time" — only message ordering relationships. Leslie Lamport formalised this insight into the happens-before relationship (1978), which forms the basis for logical clocks.
What you CAN use timestamps for (with caveats):
• Approximate ordering when events are seconds apart and precision doesn't matter
• Human-readable audit logs (not as ordering keys)
• TTL expiration (bounded accuracy acceptable)
• TrueTime (Spanner) — bounded uncertainty makes causal ordering possible, but requires GPS + atomic clocks
Key vocabulary:
• NTP (Network Time Protocol) — protocol for synchronising computer clocks over a network; typical accuracy ±1-100ms depending on network quality
• Clock drift — the gradual divergence of a clock from true time between synchronisation rounds; caused by oscillator imprecision
• Clock skew — the difference in clock readings between two nodes at any given moment
• Leap second — an occasional 1-second addition to UTC to account for Earth's variable rotation; can disrupt software expecting monotonic time
• Happens-before (→) — a partial order relationship between events in a distributed system: A→B means A causally precedes B
Sources of clock inaccuracy:
• NTP synchronisation error — typical LAN: ±1-5ms. WAN: ±50-100ms. GPS-disciplined: ±1μs but expensive
• Clock drift — quartz oscillators drift ~1-10ms/day between NTP syncs. A 30-second sync interval at 10ms/day = ~350μs/interval drift
• Leap seconds — UTC adds a second when Earth's rotation slows; OS handling varies: clock freeze (Linux historical), smear (Google), jump; all cause problems
• VM clock skew — VM clocks are emulated; during host overload or live migration, guest clocks may fall behind by seconds
• Network jitter — NTP corrections themselves are imprecise because the correction is based on round-trip time divided by 2, assuming symmetric latency
The Einstein insight for distributed systems:
In special relativity, there is no universal "now" for observers in different locations. Similarly, in a distributed system, there is no globally agreed "current time" — only message ordering relationships. Leslie Lamport formalised this insight into the happens-before relationship (1978), which forms the basis for logical clocks.
What you CAN use timestamps for (with caveats):
• Approximate ordering when events are seconds apart and precision doesn't matter
• Human-readable audit logs (not as ordering keys)
• TTL expiration (bounded accuracy acceptable)
• TrueTime (Spanner) — bounded uncertainty makes causal ordering possible, but requires GPS + atomic clocks
Key vocabulary:
• NTP (Network Time Protocol) — protocol for synchronising computer clocks over a network; typical accuracy ±1-100ms depending on network quality
• Clock drift — the gradual divergence of a clock from true time between synchronisation rounds; caused by oscillator imprecision
• Clock skew — the difference in clock readings between two nodes at any given moment
• Leap second — an occasional 1-second addition to UTC to account for Earth's variable rotation; can disrupt software expecting monotonic time
• Happens-before (→) — a partial order relationship between events in a distributed system: A→B means A causally precedes B