5 exercises — Practice TDD and BDD vocabulary in English: red-green-refactor, Gherkin syntax, example mapping, mutation testing, and test architecture.
Core TDD & BDD vocabulary clusters
TDD cycle: red (failing test), green (minimal implementation), refactor (improve design), repeat
A senior developer explains TDD to a developer who never used it: "TDD — Test-Driven Development — flips the order: you write the failing test first, then the minimum code to make it pass, then refactor. The red-green-refactor cycle. Red: write a test that fails (it must fail — if it passes immediately, you're not testing anything new). Green: write the simplest code that makes it pass — not the cleanest, just the minimum. Refactor: now clean up the code with the safety net of the passing test. The cycle takes 2-5 minutes per iteration." Why must the test in the red phase fail before you write any implementation?
Red phase requirement: the test must fail before implementation for a critical reason — it proves the test is actually testing something. If you write a test and it passes immediately, one of three things happened: 1) the test has a logic error (always passes regardless of implementation), 2) the feature already exists, or 3) you're not testing the right thing. A test that can't fail is not a test — it's a no-op. This is the TDD safety check. TDD cycle vocabulary: Red: write a test. Run it. It must fail. See the failure message — it should be the meaningful failure, not a compilation error. Green: write the simplest code that makes the test pass. No elegance required. Even hardcoding the expected return value is valid at this stage. Refactor: improve design with the confidence of passing tests. Extract methods, rename variables, remove duplication. Tests must still pass after refactoring. YAGNI (You Aren't Gonna Need It): only implement what a current test requires. TDD enforces this naturally. Triangulation: write multiple test cases that force a more general implementation. If the first test passes with return 42, write a second test that breaks that shortcut. Test vocabulary: Test oracle: how the test knows if the output is correct (the expected value in an assert). Test fixture: the setup state for a test (objects, database state). Arrange-Act-Assert (AAA): structure for a unit test. Test double: generic term for fake objects (stub, mock, spy, fake, dummy). In conversation: 'I call the failing test a "specification." It's not a test yet — it's a specification of what the code should do. The red phase is when I write the spec; green is when I implement it.'
2 / 5
A BDD practitioner explains Gherkin to a developer team: "Gherkin is the structured language for writing BDD scenarios. Feature describes the overall capability. Scenario is a specific example. Given establishes the context (preconditions). When describes the action. Then describes the expected outcome. And/But chain multiple steps of the same type. Scenario Outline with an Examples table lets you run the same scenario with different data. The key: Gherkin is not a testing tool — it's a communication tool. Business, product, and engineering should all understand it." What is the purpose of Scenario Outline with an Examples table in Gherkin?
Scenario Outline: a template scenario with placeholders in angle brackets. The Examples table provides concrete values for each placeholder. Each row in the Examples table is a separate test run. Example: Given a user with <role> / When they access <page> / Then they <see_or_not> the admin panel. Examples table: role | page | see_or_not / admin | /admin | see / viewer | /admin | do not see. This replaces writing two almost-identical scenarios. Gherkin vocabulary: Feature: the capability being described. One .feature file per feature. Includes a title and narrative (As a... I want... So that...). Scenario: a concrete example. Executable specification. Given: precondition / starting state. Set up the world. When: the action / trigger. The thing being tested. Then: the expected outcome. Observable result. And / But: continuation of the previous step type. Don't mix (And following a Given is a Given; And following a Then is a Then). Background: shared Given steps for all scenarios in a file. Tags: @slow, @wip, @smoke — for filtering and documentation. Step definitions: the code that maps Gherkin steps to automation. Written once, reused across scenarios. BDD frameworks: Cucumber (Java/Ruby/JavaScript), Behave (Python), SpecFlow (.NET). In conversation: 'The test names in Gherkin are the requirements. If a business person can't read a scenario and tell you if the product behaves that way — the scenario is written wrong.'
3 / 5
A delivery lead explains Example Mapping before sprint planning: "Example Mapping is a BDD discovery workshop. Before writing code, product, tech, and QA sit down for 25 minutes. Yellow cards: rules (business rules). Blue cards: examples (concrete scenarios that illustrate a rule). Red cards: questions (open questions we can't answer yet). If we have more red cards than blue cards, the story isn't ready. Example Mapping replaces long requirement documents with a structured conversation." What are the Three Amigos in BDD, and why is their collaboration important during discovery?
Three Amigos: the three perspectives required for shared understanding before development: Business/Product: what problem are we solving? What is the desired outcome? What rules govern the behavior? Developer: how will this be implemented? What technical constraints exist? What questions need clarification? Tester/QA: what could go wrong? What edge cases exist? What scenarios would break this? Together these three roles identify misunderstandings and ambiguities before code is written — much cheaper to resolve in a 25-minute conversation than in a bug report after deployment. Example Mapping vocabulary: Rule (yellow card): a business rule that governs the feature. Example: "Users can't withdraw more than their balance." Example (blue card): a concrete scenario illustrating a rule. One example per card, written in Given/When/Then format. Question (red card): an open question blocking understanding. Captured without discussion — answer after the workshop. Story readiness: more blue cards than red cards — the story has more known than unknown. Many red cards = story not ready for sprint. BDD vocabulary: Specification by example: using concrete examples to specify behavior, not abstract rules. Living documentation: tests that are also documentation — always up-to-date because they're executed. Outside-in development: start from acceptance tests (outside), work inward to unit tests. Discovery, formulation, automation: the three BDD phases. Discovery = Three Amigos. Formulation = write Gherkin. Automation = implement step definitions. In conversation: 'We moved Example Mapping to the week before the sprint. Stories that survive the session with fewer than 5 red cards go into the sprint. Stories drowning in red cards go back to product for clarification. It halved our mid-sprint surprises.'
4 / 5
A senior engineer explains mutation testing to the team: "Mutation testing answers the question: are our tests actually good? It automatically modifies your code — changing a + to -, flipping a true to false, removing a condition — and runs your test suite. If your tests catch the change (the test fails), the mutant is killed. If your tests pass despite the change, the mutant survived — your tests didn't catch it. Mutation score = killed / total. A mutation score of 95% means 95% of small faults in your code would be caught by your tests." What is a survived mutant in mutation testing, and what does it reveal about test quality?
Mutation testing: automated fault injection to evaluate test quality. Types of mutations (mutants): Arithmetic operator mutation: + becomes -, * becomes /. Logical operator mutation: AND becomes OR. Relational operator mutation: > becomes >=, == becomes !=. Conditional boundary mutation: if (x > 0) becomes if (x >= 0). Statement deletion: removes a line of code. Killed mutant: at least one test failed after the mutation — the tests detect this type of fault. Good. Survived mutant: all tests passed despite the mutation — the tests would not catch a real bug of this type. Reveals: missing test cases, tests that assert the wrong thing, tests that are too permissive in their assertions. Equivalent mutant: a mutation that doesn't change observable behavior (e.g., i++ vs ++i in a standalone statement). Cannot be killed by any test — filtered by tools. Mutation testing tools: PIT (Java), Stryker (JavaScript/TypeScript, .NET), Cosmic Ray (Python). Mutation score interpretation: 60-70%: some testing, significant gaps. 80-85%: solid test suite. 90%+: excellent coverage. 100%: theoretically possible but often includes equivalent mutants. Vocabulary: Test coverage vs. mutation score: 100% line coverage does not mean tests will catch bugs. A line can be executed and the test still pass if the assertion is weak. Mutation score is a deeper quality measure. In conversation: 'Line coverage tells you which lines were touched. Mutation score tells you which lines were verified. A test that asserts nothing has 100% line coverage and 0% mutation score.'
5 / 5
An architect explains the test pyramid and its antipatterns: "The test pyramid: many unit tests at the bottom (fast, isolated, cheap), fewer integration tests in the middle, even fewer E2E tests at the top (slow, flaky, expensive). The antipattern is the ice cream cone — everything inverted: lots of manual and E2E tests, few unit tests. Another antipattern: the hourglass — lots of unit tests and E2E, no integration tests. The result: unit tests pass, E2E tests pass, but failures happen at the integration points between components." What is the test ice cream cone antipattern and what problems does it create?
Test ice cream cone: inverted test pyramid. Manual testing + E2E tests dominate; few unit tests. Common in teams that: adopt test automation late (add E2E tests to an untested codebase), rely on QA as the primary testing function, fear breaking existing E2E tests and don't refactor to unit tests. Problems: Slow feedback: E2E tests run in minutes; unit tests in milliseconds. Ice cream cone = CI pipeline takes 1-2 hours. Flakiness: E2E tests are brittle — network timeouts, UI rendering, test data state. Poor failure localisation: E2E failure doesn't tell you which unit caused it. Expensive maintenance: UI changes break many E2E tests. Test pyramid vocabulary: Unit test: tests a single function/class in isolation. Fast (milliseconds). No external dependencies. Integration test: tests the interaction between components (service + database, API + client). Slower but catches interface bugs. E2E (end-to-end) test: tests the full system from user perspective. Tools: Cypress, Playwright, Selenium. Contract test: tests that a service fulfills the contract expected by its consumers. Catches API breaking changes without full E2E. Tools: Pact. Acceptance test: verifies the system satisfies business requirements. Often synonymous with E2E in context. Smoke test: minimal subset run after deployment to verify critical paths work. Flaky test: passes and fails non-deterministically. Most dangerous antipattern — teams start ignoring failures. In conversation: 'Every E2E test you write should ask: what's the unit test that tests this same logic? The E2E tests should be the tip of the iceberg — the visible confirmation that the invisible unit tests are working.'