Distributed transaction comparison
Two-phase commit (2PC) vs Saga pattern
Both patterns answer the same question — how do you keep a multi-step, multi-service operation consistent when any step can fail? Two-Phase Commit answers it with locking and a coordinator; the Saga pattern answers it by allowing each step to commit independently and undoing them with compensations if something later fails.
TL;DR
- 2PC holds locks on every participant until a coordinator confirms all of them are ready, then commits everywhere atomically — strongly consistent, but blocking and hard to scale across services.
- Saga lets each step commit independently and immediately, and rolls back completed steps with compensating transactions if a later step fails — no cross-service locks, but only eventually consistent.
- 2PC dominates within a controlled environment (one team, reliable network, e.g. XA transactions across databases); Saga dominates across independently owned microservices.
Side-by-side comparison
| Aspect | Two-Phase Commit (2PC) | Saga pattern |
|---|---|---|
| Consistency model | Strong — atomic across all participants | Eventual — intermediate states are observable |
| Coordination role | Central coordinator drives prepare/commit | Orchestrator (explicit) or choreography (event chain) |
| Resource locking | Yes — participants lock resources until final decision | No — each step commits and releases immediately |
| Failure recovery | Abort the whole transaction; risk of blocking on coordinator crash | Run compensating transactions for completed steps |
| Availability | Lower — participants can be stuck waiting | Higher — no cross-service blocking locks |
| Scales across many services? | Poorly — lock contention and coordinator dependency grow | Well — this is its primary design goal |
| Typical context | Multiple databases/resource managers in one trust domain (XA) | Independently deployed microservices |
| Rollback mechanism | Native abort — nothing was ever committed | Explicit compensating logic you must write yourself |
Code / protocol side-by-side
2PC — coordinator flow
// Phase 1: PREPARE
for each participant p:
vote = p.prepare(txId)
if vote != YES:
coordinator.decide(txId, ABORT)
break
// Phase 2: COMMIT or ABORT
if all votes == YES:
coordinator.decide(txId, COMMIT)
for each participant p:
p.finalize(txId, decision)
// Participants held their locks
// since PREPARE -- released only
// now, after the final decision Saga (orchestration) — with compensation
const steps = [
{ do: bookFlight, undo: cancelFlight },
{ do: reserveHotel, undo: cancelHotel },
{ do: chargeCard, undo: refundCard },
];
const completed = [];
try {
for (const step of steps) {
await step.do(); // commits immediately
completed.push(step); // remember for rollback
}
} catch (err) {
// Run compensations in REVERSE order
for (const step of completed.reverse()) {
await step.undo();
}
} When to use 2PC
- All participants are in one trust/operational domain. Coordinating multiple database resource managers under a single application server (classic XA transactions) is where 2PC still fits comfortably.
- Strong, immediate consistency is non-negotiable. If intermediate, partially-applied states are unacceptable for even a moment, 2PC's all-or-nothing atomicity is worth the availability cost.
- The transaction is short-lived and low-latency. 2PC's blocking window is much less risky when prepare-to-commit takes milliseconds, not the seconds or minutes a cross-service saga step might take.
- You control the coordinator's reliability tightly. If the coordinator itself is highly available (replicated, monitored), the "blocking problem" risk is minimised.
When to use the Saga pattern
- The operation spans independently deployed microservices. Each service owns its own database; you cannot (and should not) hold a lock inside another team's service for the duration of a multi-step business process.
- Availability matters more than instantaneous consistency. Order processing, travel booking, and e-commerce checkouts commonly accept a brief inconsistent window in exchange for never locking resources across services.
- Steps can take a long time or involve external systems. Waiting on a third-party payment gateway or a slow partner API inside a 2PC prepare phase would hold locks for far too long; sagas let each step complete and release independently.
- You can design meaningful compensating actions. If every step has a sensible "undo" (cancel, refund, release), sagas give you resilience without the coordination overhead of 2PC.
English phrases engineers use
2PC conversations
- "The coordinator is still waiting on one participant's vote."
- "That's the classic blocking problem — the coordinator died mid-transaction."
- "We're using XA transactions to coordinate these two databases."
- "A missing vote is treated as a no-vote — the whole transaction aborts."
Saga conversations
- "Step 3 failed, so we're running compensations for steps 1 and 2."
- "This is a choreography saga — no central orchestrator, just event listeners."
- "We need a compensating transaction to reverse the hotel reservation."
- "Between steps, the system is only eventually consistent — that's expected."
Quick decision tree
- All participants in one trust domain, short transaction → 2PC / XA transactions
- Multi-step process across independent microservices → Saga pattern
- A step may take seconds/minutes or hit external APIs → Saga (2PC locks too long)
- Strong, instant, all-or-nothing consistency is mandatory → 2PC (accept the availability cost)
- Few steps, need simpler failure reasoning → Orchestration-based Saga
- Many services, high autonomy, event-driven culture → Choreography-based Saga
Frequently asked questions
What problem do both 2PC and Saga solve?
Both coordinate a single business operation that must update data across multiple independent services or databases — for example, "book a flight and reserve a hotel" where each lives in a different service. Without coordination, one part could succeed while the other fails, leaving the system in an inconsistent state. 2PC and Saga are two very different answers to "how do we keep that from happening."
Why is 2PC called "blocking"?
If the coordinator crashes after sending "prepare" but before sending the final "commit" or "abort," participants that voted yes are stuck holding their locks indefinitely — they cannot safely proceed without knowing the coordinator's decision, because prematurely committing or aborting could contradict what the coordinator eventually decides. This is called the "blocking problem," and it is the main reason 2PC is avoided in systems that need high availability.
What is a compensating transaction in a Saga?
A compensating transaction is the "undo" step for a saga step that already succeeded. If step 3 of a 5-step saga fails, the saga must run compensations for steps 1 and 2 in reverse order (e.g. "cancel the flight reservation," "refund the hotel charge") — since the underlying operations already committed independently, you cannot simply roll them back like a database transaction; you must explicitly reverse their business effect.
Are Sagas eventually consistent?
Yes. Between the first step committing and the last step completing (or all compensations finishing after a failure), the overall business operation is in an intermediate state that other parts of the system can observe — for example, a hotel reservation existing before the flight booking step even attempts to run. This is a deliberate trade-off: sagas give up the "all-or-nothing, instantly" guarantee of 2PC in exchange for not locking resources across services.
What is the difference between choreography-based and orchestration-based Sagas?
In a choreography saga, each service listens for events from the previous step and decides independently what to do next — no central coordinator, just a chain of event-driven reactions. In an orchestration saga, a central orchestrator explicitly calls each service in sequence and decides how to handle failures and trigger compensations. Choreography scales better with fewer services; orchestration is easier to reason about and debug as the number of steps grows.
Does 2PC guarantee zero downtime for participants?
No — quite the opposite. Participants that voted "yes" in the prepare phase must hold locks on their resources until the coordinator's final decision arrives, meaning those resources are unavailable to other transactions for that entire window. This resource-locking behaviour is the main reason 2PC does not scale well across many services or over unreliable networks.
Which one is used inside a single database, and which across services?
Both can technically appear in either context, but in practice 2PC is the classic mechanism for coordinating a transaction across multiple database instances or resource managers within a controlled environment (X/Open XA is the standard API for this), while the Saga pattern is the dominant approach for coordinating a business process across independently deployed microservices, where holding cross-service locks for the duration of 2PC is operationally unacceptable.