5 exercises — practice structured answers for Prompt Engineer interviews covering few-shot vs. zero-shot explanation, temperature choice vocabulary, prompt injection risk, prompt versioning, and chain-of-thought reasoning.
How to structure Prompt Engineer interview answers
Few-shot vs. zero-shot: few-shot is about format alignment, not teaching skills — examples cost context window tokens
Temperature: "I set temperature to 0.2 for deterministic outputs — higher temperature introduces variance that looks like a data quality issue"
Prompt injection: direct (user writes the injection) vs. indirect (embedded in processed content) — model alone cannot be trusted to resist
Prompt versioning: store as named versioned artifacts in source control; evaluate against a frozen dataset before deploying; track token cost delta
Chain-of-thought: intermediate steps become tokens in context, conditioning subsequent predictions — converts implicit reasoning into explicit computation
0 / 5 completed
1 / 5
The interviewer asks: "How do you explain the difference between few-shot and zero-shot prompting to a product manager?" Which answer is most accessible?
Option B is strongest: it uses a concrete business analogy (briefing a new contractor), distinguishes the two approaches with the same task (support ticket summary), provides the key insight that few-shot is about format alignment not skill teaching, gives specific examples of when each approach is appropriate (common tasks vs. company voice/JSON schema/taxonomy), and introduces the context window cost trade-off with a concrete token count. The 'format alignment, not teaching' insight is what separates a PM-accessible explanation from a technical definition. Prompt engineering vocabulary:Zero-shot prompting — asking a model to complete a task without providing examples of the desired output. Few-shot prompting — providing N examples of input-output pairs in the prompt to align the model's output with a specific format or style. Context window — the maximum number of tokens a model can process in a single prompt and response. Format alignment — using examples to steer the model toward a specific output structure, tone, or vocabulary. Options C and D are accurate but lack the contractor analogy and the format-alignment insight.
2 / 5
The interviewer asks: "How do you decide what temperature setting to use for a task?" Which answer demonstrates the most principled approach?
Option B is strongest: it explains the mechanism of temperature (probability distribution shape — a technically correct description accessible to non-ML readers), provides three named tiers with specific ranges and use cases, gives concrete task examples for each tier, and provides the exact stakeholder communication sentence — 'variance that would look like a data quality issue' — which translates a model parameter into a business risk. That translation is the key communication skill being tested. The 'data quality issue' framing is particularly effective because it connects the technical setting to a concern that PMs and analysts already have. Temperature vocabulary:Temperature — a parameter that controls the randomness of a language model's output by scaling the probability distribution over token choices. Deterministic output — output that is identical on every run given the same input; achieved at temperature 0. Stochasticity — the degree of randomness in a system's output. Token probability distribution — the probability assigned to each possible next token in the model's output. Options C and D are accurate but lack the mechanism explanation and the 'data quality issue' stakeholder framing.
3 / 5
The interviewer asks: "Can you explain what prompt injection is and give an example of the risk?" Which answer best communicates the threat?
Option B is strongest: it names the SQL injection analogy (which immediately communicates the severity class to any developer), provides a concrete direct injection example with the exact malicious text, introduces indirect prompt injection as a separate and more dangerous category with a specific realistic scenario (email assistant), and provides the communication sentence that frames the risk as a system design problem ('the model alone cannot be trusted to ignore injected instructions'). The indirect injection example — where the attacker never interacts with the system directly — is the insight that separates a practitioner from someone who only knows the basic definition. Prompt injection vocabulary:Prompt injection — an attack where user-controlled input contains instructions that override or bypass the developer's system prompt. Direct prompt injection — the attacker writes the injection directly in their input to the model. Indirect prompt injection — the attacker embeds instructions in content the model is asked to process (a document, email, or web page). Input sanitisation — the process of detecting and removing or neutralising potentially malicious content in user input before it reaches the model. Output validation — checking the model's output for signs that injected instructions may have been executed. Options C and D are accurate but lack the SQL injection analogy and the indirect injection example.
4 / 5
The interviewer asks: "How do you manage prompt versioning in a production system?" Which answer reflects the most mature practice?
Option B is strongest: it opens with the principle that justifies the whole system ('a prompt change can break production as silently as a code change'), provides the naming convention for stored prompts (not just 'store in version control'), explains what the evaluation dataset records (accuracy, regressions, token cost), introduces the specific cost impact calculation (200 tokens × 1M calls/day = significant cost), explains shadow mode evaluation, and provides the exact changelog entry format. The token cost calculation is the detail that shows production experience — a prompt engineer who has not managed large-scale deployments would not think to calculate the cost impact of a token increase. Prompt versioning vocabulary:Prompt artifact — a named, versioned file containing a prompt, stored and managed like code. Evaluation dataset — a frozen set of representative inputs with expected outputs used to test prompt changes before deployment. Shadow mode — running a new system version in parallel with the current version on live inputs, without serving its outputs to users. Prompt regression — a case where a new prompt version produces worse output than the previous version on a known input. Token cost delta — the change in token consumption per call between prompt versions. Options C and D are accurate but lack the token cost calculation and the principle justification.
5 / 5
The interviewer asks: "Why does chain-of-thought prompting improve model performance on complex tasks?" Which answer is most precise?
Option B is strongest: it explains the mechanism in terms of how language models actually work (sequential token prediction conditioning on previous tokens), explains why direct answering is harder (no intermediate tokens to build on), provides a concrete arithmetic example showing exactly which tokens serve as intermediate anchors, distinguishes the task types where chain-of-thought helps most vs. least, and provides the communication sentence that makes the choice of chain-of-thought feel like an engineering decision rather than a guess. The 'implicit mental shortcut vs. explicit computation' framing is the key insight. Chain-of-thought vocabulary:Chain-of-thought (CoT) prompting — a prompting technique that asks the model to generate intermediate reasoning steps before producing a final answer. Token conditioning — the process by which each generated token is influenced by all previous tokens in the sequence. Intermediate reasoning step — an explicitly generated step in the reasoning chain that serves as context for subsequent steps. Zero-shot CoT — triggering chain-of-thought without examples by appending 'Let's think step by step' to the prompt. Few-shot CoT — providing examples of step-by-step reasoning in the prompt to demonstrate the desired reasoning pattern. Options C and D are accurate but lack the token conditioning mechanism explanation and the concrete arithmetic example.