Compiler & Language Design Engineer Interview Questions
5 exercises — practice structuring strong English answers for compiler and language design engineering interviews: pipeline, SSA, type systems, interpretation, and LLVM.
How to structure Compiler Engineering interview answers
Type system questions: static vs. dynamic → type inference → algebraic data types → null safety mechanism
Compilation vs. interpretation: JIT compilation → bytecode VM → ahead-of-time vs. just-in-time trade-offs
LLVM questions: LLVM IR → passes → target-independence → linking to LLVM from custom frontend
0 / 5 completed
1 / 5
The interviewer asks: "Walk me through the stages of a compiler from source to binary." Which answer is most complete?
Option B is strongest: it names all six stages with explicit input/output contracts, names the specific error class produced by each stage, names the parsing algorithm families (LL vs. LR) with real tools (yacc/bison), explains why IR is introduced (target-independent optimisation), names five specific optimisation passes, and explains register allocation as the hardest backend problem (NP-complete, graph colouring heuristic). Compiler pipeline vocabulary:Lexical analysis — tokenises raw source text using regular expressions. Abstract Syntax Tree (AST) — a tree representation of the syntactic structure of source code. Context-Free Grammar (CFG) — the formal grammar used by parsers to recognise valid programs. Intermediate Representation (IR) — a target-independent code representation between AST and machine code. SSA form — Static Single Assignment form (see next exercise). Register allocation — assigning virtual registers to physical hardware registers; solved with graph colouring. Options C and D are accurate but lack the input/output contracts and the register allocation NP-completeness detail.
2 / 5
The interviewer asks: "What is SSA form and why is it useful for optimisation?" Which answer is most accurate?
Option B is strongest: it explains the "static" qualifier of SSA (program text, not runtime), gives a concrete code example for phi functions, explains why phi functions appear specifically at CFG join points, provides four concrete optimisation enablements with the mechanism for each, introduces the chordal graph property of SSA interference graphs (an advanced insight), and names the phi elimination pass as the SSA exit mechanism. SSA vocabulary:Static Single Assignment (SSA) — an IR property: each variable is defined exactly once in the program text. Phi function (φ) — a special SSA instruction at control flow join points that selects between values from different predecessor blocks. Control Flow Graph (CFG) — a directed graph where nodes are basic blocks and edges represent control flow. Constant propagation — replacing uses of a variable with its known constant value. Chordal graph — a graph where every cycle of length ≥ 4 has a chord; SSA interference graphs are chordal, enabling optimal register allocation in polynomial time. Options C and D are accurate but lack the phi function code example and the chordal graph property.
3 / 5
The interviewer asks: "How would you design a type system that prevents null pointer dereferences?" Which answer is most thorough?
Option B is strongest: it names the fundamental design principle (split reference type into T and T?), compares two design approaches with named language examples and the trade-off for each, explains flow-sensitive typing as a third mechanism with a concrete TypeScript example, explains the migration cost for nullable wrappers, and introduces the monad composition requirement for sum types. Type system vocabulary:Non-nullable type — a type whose values are guaranteed to be non-null by the type checker. Nullable type (T?) — a type that may hold null; requires null check before use. Sum type (Option/Maybe) — an algebraic data type that is either Some(value) or None; requires exhaustive pattern matching. Smart cast / type narrowing — flow-sensitive typing where a variable's type is refined within a branch. Flow-sensitive typing — a type system feature where variable types can change based on runtime checks. Monadic composition — functional programming patterns for chaining operations on Option/Maybe values. Options C and D are accurate but lack the flow-sensitive typing explanation and the migration cost analysis.
4 / 5
The interviewer asks: "What's the difference between a compiled language and an interpreted one?" Which answer is most nuanced?
Option B is strongest: it immediately frames the question as a spectrum (not binary), defines four distinct points on the spectrum with language examples for each, explains HOW JIT achieves comparable or better performance than AOT (profile-guided optimisation with runtime data), explains the JIT overhead mechanism (startup latency, dual memory), and names modern examples that blur the classification (PyPy, V8, Rust macros). Compilation vocabulary:AOT (Ahead-of-Time) compilation — fully compiling to machine code before execution. Bytecode — a portable instruction set for a virtual machine, not native machine code. JVM (Java Virtual Machine) — the bytecode interpreter and JIT compiler for Java. JIT (Just-In-Time) compilation — compiling bytecode to native code at runtime when code becomes "hot." Warm-up period — the time before JIT compilation occurs; the program runs slower during this phase. Profile-guided optimisation — using runtime execution data to make better optimisation decisions. Options C and D are accurate but lack the profile-guided optimisation advantage explanation and the modern language examples that blur the classification.
5 / 5
The interviewer asks: "How does LLVM IR enable targeting multiple architectures?" Which answer is most technical?
Option B is strongest: it names all three layers with the specific responsibility of each, explains WHY frontends benefit from middle-end optimisations for free (key insight: they target the same IR), names the TableGen system for instruction selection, explains the infinite virtual register property that enables target-independent IR, names MLIR as the path for domain-specific IRs, and closes with the key LLVM value proposition for compiler authors. LLVM vocabulary:LLVM IR — a strongly-typed, SSA-form intermediate representation used as LLVM's common language. Middle-end — the target-independent optimisation layer between frontend and backend. Pass manager — LLVM's framework for applying optimisation passes to IR. Instruction selection — mapping IR operations to target machine instructions. TableGen — a domain-specific language for declaring LLVM backend rules. MLIR (Multi-Level IR) — a framework for building domain-specific IRs that lower to LLVM IR. Options C and D are accurate but lack the TableGen system and the "free optimisation" insight for new frontends.