Distributed transaction comparison

Two-phase commit (2PC) vs Saga pattern

Both patterns answer the same question — how do you keep a multi-step, multi-service operation consistent when any step can fail? Two-Phase Commit answers it with locking and a coordinator; the Saga pattern answers it by allowing each step to commit independently and undoing them with compensations if something later fails.

TL;DR

2PC holds locks on every participant until a coordinator confirms all of them are ready, then commits everywhere atomically — strongly consistent, but blocking and hard to scale across services.
Saga lets each step commit independently and immediately, and rolls back completed steps with compensating transactions if a later step fails — no cross-service locks, but only eventually consistent.
2PC dominates within a controlled environment (one team, reliable network, e.g. XA transactions across databases); Saga dominates across independently owned microservices.

Side-by-side comparison

Aspect	Two-Phase Commit (2PC)	Saga pattern
Consistency model	Strong — atomic across all participants	Eventual — intermediate states are observable
Coordination role	Central coordinator drives prepare/commit	Orchestrator (explicit) or choreography (event chain)
Resource locking	Yes — participants lock resources until final decision	No — each step commits and releases immediately
Failure recovery	Abort the whole transaction; risk of blocking on coordinator crash	Run compensating transactions for completed steps
Availability	Lower — participants can be stuck waiting	Higher — no cross-service blocking locks
Scales across many services?	Poorly — lock contention and coordinator dependency grow	Well — this is its primary design goal
Typical context	Multiple databases/resource managers in one trust domain (XA)	Independently deployed microservices
Rollback mechanism	Native abort — nothing was ever committed	Explicit compensating logic you must write yourself

Code / protocol side-by-side

2PC — coordinator flow

// Phase 1: PREPARE
for each participant p:
  vote = p.prepare(txId)
  if vote != YES:
    coordinator.decide(txId, ABORT)
    break

// Phase 2: COMMIT or ABORT
if all votes == YES:
  coordinator.decide(txId, COMMIT)
for each participant p:
  p.finalize(txId, decision)
  // Participants held their locks
  // since PREPARE -- released only
  // now, after the final decision

Saga (orchestration) — with compensation

const steps = [
  { do: bookFlight, undo: cancelFlight },
  { do: reserveHotel, undo: cancelHotel },
  { do: chargeCard, undo: refundCard },
];

const completed = [];
try {
  for (const step of steps) {
    await step.do();       // commits immediately
    completed.push(step);  // remember for rollback
  }
} catch (err) {
  // Run compensations in REVERSE order
  for (const step of completed.reverse()) {
    await step.undo();
  }
}

When to use 2PC

All participants are in one trust/operational domain. Coordinating multiple database resource managers under a single application server (classic XA transactions) is where 2PC still fits comfortably.
Strong, immediate consistency is non-negotiable. If intermediate, partially-applied states are unacceptable for even a moment, 2PC's all-or-nothing atomicity is worth the availability cost.
The transaction is short-lived and low-latency. 2PC's blocking window is much less risky when prepare-to-commit takes milliseconds, not the seconds or minutes a cross-service saga step might take.
You control the coordinator's reliability tightly. If the coordinator itself is highly available (replicated, monitored), the "blocking problem" risk is minimised.

When to use the Saga pattern

The operation spans independently deployed microservices. Each service owns its own database; you cannot (and should not) hold a lock inside another team's service for the duration of a multi-step business process.
Availability matters more than instantaneous consistency. Order processing, travel booking, and e-commerce checkouts commonly accept a brief inconsistent window in exchange for never locking resources across services.
Steps can take a long time or involve external systems. Waiting on a third-party payment gateway or a slow partner API inside a 2PC prepare phase would hold locks for far too long; sagas let each step complete and release independently.
You can design meaningful compensating actions. If every step has a sensible "undo" (cancel, refund, release), sagas give you resilience without the coordination overhead of 2PC.

English phrases engineers use

2PC conversations

"The coordinator is still waiting on one participant's vote."
"That's the classic blocking problem — the coordinator died mid-transaction."
"We're using XA transactions to coordinate these two databases."
"A missing vote is treated as a no-vote — the whole transaction aborts."

Saga conversations

"Step 3 failed, so we're running compensations for steps 1 and 2."
"This is a choreography saga — no central orchestrator, just event listeners."
"We need a compensating transaction to reverse the hotel reservation."
"Between steps, the system is only eventually consistent — that's expected."

Quick decision tree

All participants in one trust domain, short transaction → 2PC / XA transactions
Multi-step process across independent microservices → Saga pattern
A step may take seconds/minutes or hit external APIs → Saga (2PC locks too long)
Strong, instant, all-or-nothing consistency is mandatory → 2PC (accept the availability cost)
Few steps, need simpler failure reasoning → Orchestration-based Saga
Many services, high autonomy, event-driven culture → Choreography-based Saga

Frequently asked questions

What problem do both 2PC and Saga solve?

Both coordinate a single business operation that must update data across multiple independent services or databases — for example, "book a flight and reserve a hotel" where each lives in a different service. Without coordination, one part could succeed while the other fails, leaving the system in an inconsistent state. 2PC and Saga are two very different answers to "how do we keep that from happening."

Why is 2PC called "blocking"?

If the coordinator crashes after sending "prepare" but before sending the final "commit" or "abort," participants that voted yes are stuck holding their locks indefinitely — they cannot safely proceed without knowing the coordinator's decision, because prematurely committing or aborting could contradict what the coordinator eventually decides. This is called the "blocking problem," and it is the main reason 2PC is avoided in systems that need high availability.

What is a compensating transaction in a Saga?

A compensating transaction is the "undo" step for a saga step that already succeeded. If step 3 of a 5-step saga fails, the saga must run compensations for steps 1 and 2 in reverse order (e.g. "cancel the flight reservation," "refund the hotel charge") — since the underlying operations already committed independently, you cannot simply roll them back like a database transaction; you must explicitly reverse their business effect.

Are Sagas eventually consistent?

Yes. Between the first step committing and the last step completing (or all compensations finishing after a failure), the overall business operation is in an intermediate state that other parts of the system can observe — for example, a hotel reservation existing before the flight booking step even attempts to run. This is a deliberate trade-off: sagas give up the "all-or-nothing, instantly" guarantee of 2PC in exchange for not locking resources across services.

What is the difference between choreography-based and orchestration-based Sagas?

In a choreography saga, each service listens for events from the previous step and decides independently what to do next — no central coordinator, just a chain of event-driven reactions. In an orchestration saga, a central orchestrator explicitly calls each service in sequence and decides how to handle failures and trigger compensations. Choreography scales better with fewer services; orchestration is easier to reason about and debug as the number of steps grows.

Does 2PC guarantee zero downtime for participants?

No — quite the opposite. Participants that voted "yes" in the prepare phase must hold locks on their resources until the coordinator's final decision arrives, meaning those resources are unavailable to other transactions for that entire window. This resource-locking behaviour is the main reason 2PC does not scale well across many services or over unreliable networks.

Which one is used inside a single database, and which across services?

Both can technically appear in either context, but in practice 2PC is the classic mechanism for coordinating a transaction across multiple database instances or resource managers within a controlled environment (X/Open XA is the standard API for this), while the Saga pattern is the dominant approach for coordinating a business process across independently deployed microservices, where holding cross-service locks for the duration of 2PC is operationally unacceptable.

Show more questions (4)