Writing Advanced System Design Documents in English
Master the language of advanced system design docs: trade-off framing, back-of-envelope estimation, and design decision vocabulary for senior engineers.
A system design document (also called a design doc, technical spec, or RFC) is one of the most consequential written artefacts in software engineering. At the advanced level, the quality of the document depends not just on the design itself — but on how precisely you articulate trade-offs, constraints, and assumptions.
Document Structure
A strong design document follows a predictable structure that readers can scan quickly:
- Context and motivation — why this design is needed now
- Goals and non-goals — what the system will and will not do
- High-level design — the architecture overview
- Detailed design — component-level decisions
- Trade-offs and alternatives considered — what you rejected and why
- Back-of-envelope estimates — capacity and cost reasoning
- Open questions — unresolved decisions
Writing the Context Section
The context section answers: Why does this matter? Use these patterns:
- “The existing approach does not scale beyond X requests per second.”
- “The current system lacks support for multi-region failover.”
- “As the team grows, manual provisioning has become a bottleneck.”
- “This design supersedes the proposal from Q3 and addresses its limitations.”
Avoid vague phrases like “we need to improve performance.” Be specific: “P99 latency exceeds 200 ms under peak load, breaching our SLA.”
Goals and Non-Goals
This section prevents scope creep and aligns reviewers. Use clear, parallel structure:
Goals:
- Support 10,000 concurrent connections per node.
- Provide exactly-once delivery guarantees for payment events.
- Allow zero-downtime deployments using a blue/green strategy.
Non-goals:
- This design does not address client-side caching.
- Out of scope: multi-tenancy support — that is tracked separately.
- We are not optimising for read-heavy workloads in this iteration.
Framing Trade-offs
The trade-off section is where senior engineers demonstrate their thinking. The key language:
Contrast connectors
- “While Option A offers lower latency, it introduces operational complexity.”
- “Option B simplifies the deployment model at the cost of vendor lock-in.”
- “Although eventual consistency reduces write latency, it complicates the client logic.”
Weighing factors
- “We prioritised operational simplicity over raw throughput.”
- “Given our team’s expertise in Postgres, we favoured a relational approach.”
- “The marginal performance gain does not justify the added complexity.”
- “This design trades storage cost for query flexibility.”
Rejecting alternatives
- “We evaluated a message-queue-based approach but rejected it because…”
- “Option C was ruled out due to its lack of official multi-region support.”
- “The consensus was that the additional operational burden outweighed the benefits.”
Back-of-Envelope Estimation
Back-of-envelope (BOE) calculations show that your design can handle the expected load. Write them clearly:
Template
“Assuming 50,000 daily active users, each generating 10 events per session, we estimate 500,000 events per day, or roughly 6 events per second on average. At peak (3× average), the system must handle ~18 events per second.”
Key phrases
- “Assuming a 10:1 read-to-write ratio…”
- “At a replication factor of 3, storage requirements triple.”
- “A single node can handle X RPS under our observed p95 latency target.”
- “This gives us ~Y GB/day of raw event data before compression.”
- “We budget 20% headroom above peak estimates.”
Use ~ for approximations and state your assumptions explicitly. Reviewers will challenge unsubstantiated numbers.
Describing Architectural Decisions
Use active, specific verbs:
| Instead of | Write |
|---|---|
| ”We will use Kafka" | "We route events through Kafka to decouple producers from consumers." |
| "The API is stateless" | "The API stores no session state, enabling horizontal scaling." |
| "We use caching" | "We cache responses at the edge with a 60-second TTL." |
| "The DB is sharded" | "We shard by user_id to distribute write load evenly.” |
Open Questions
Close your document with unresolved decisions, not blank silence:
- “TBD: Whether to use a managed Kafka service or self-host. Depends on cost analysis (see ticket #1234).”
- “Under discussion: The retention policy for audit logs — legal review pending.”
- “Decision needed before implementation: SLA tier for the async write path.”
Tone and Register
Design docs are formal but not stiff. Use:
- We (not “the author” or “I”) — design is a team decision
- Present tense for the chosen design: “The service exposes a REST API.”
- Future tense for planned work: “The v2 implementation will add streaming support.”
- Avoid hedging phrases like “it might be possible that” — be direct.
Key Takeaways
- Context must be specific, not generic. State metrics, not feelings.
- Goals/Non-goals align reviewers and prevent scope creep.
- Trade-offs use contrast connectors: “while,” “at the cost of,” “although.”
- BOE estimates require explicit assumptions and a clear unit trail.
- Open questions are a sign of maturity, not incompleteness.
- Use active, precise verbs: route, shard, cache, expose, decouple — not “use” or “do.”