Writing LLVM Pass Documentation: Vocabulary and Patterns for Compiler Engineers
A guide to writing clear LLVM pass documentation — pass description vocabulary, intent statements, transforms vs analyses, requirements, and documentation patterns for compiler engineers.
Why LLVM Pass Documentation Is Hard to Write
LLVM pass documentation has a reputation for being terse to the point of incomprehensibility, or verbose in the wrong places. Engineers writing a new optimisation pass often document the implementation thoroughly and the intent hardly at all. Readers of the documentation — who may be trying to understand whether the pass is safe to run on their IR, or in what order to schedule it — are left to reverse-engineer the intent from the implementation.
Good LLVM pass documentation answers four questions before the reader reaches the implementation:
- What does this pass do? (A one-sentence description of the transformation or analysis)
- When should it run? (Prerequisites and pass ordering requirements)
- What does it preserve? (Which analyses remain valid after the pass)
- Are there any important limitations or assumptions? (Edge cases and known issues)
The Vocabulary of Pass Intent
Describing Transformations
A transform pass modifies the IR. Documentation for a transform pass uses this vocabulary:
| Verb | Usage |
|---|---|
| transforms | The pass changes the structure of the IR in a significant way |
| replaces | One construct is substituted for another |
| eliminates | A construct is removed (dead code elimination, redundancy elimination) |
| hoists | A computation is moved to an earlier point (earlier in the function, or out of a loop) |
| sinks | A computation is moved to a later point |
| folds | Constant expressions are evaluated at compile time |
| inlines | A call site is replaced with the callee’s body |
| canonicalises | The IR is brought into a standard normalised form |
| lowers | A high-level construct is replaced with a lower-level equivalent |
| decomposes | A complex instruction or pattern is split into simpler parts |
| merges | Multiple constructs are combined into a single one |
| vectorises | Scalar operations are transformed into vector operations |
Example intent statement using this vocabulary: “This pass hoists loop-invariant loads out of inner loops and into the loop preheader, eliminating redundant memory accesses in cases where the loaded address and the loaded value are both provably unchanged across loop iterations.”
Describing Analyses
An analysis pass computes information without modifying the IR. Documentation for an analysis pass uses this vocabulary:
| Verb | Usage |
|---|---|
| computes | The pass calculates a property of the IR |
| determines | The pass resolves a question about the IR |
| collects | The pass gathers a set of facts |
| identifies | The pass finds instances of a pattern |
| annotates | The pass adds metadata to IR elements |
| approximates | The pass produces a conservative estimate of a property |
Example intent statement: “This analysis computes the alias sets for all pointer-valued instructions in a function, producing a conservative approximation of the memory access relationships that subsequent transform passes can query.”
Writing Pass Descriptions in LLVM Style
LLVM pass descriptions in source code headers and documentation follow a consistent style. Studying existing LLVM passes is the best way to calibrate your own writing.
The One-Sentence Summary
This appears in the pass registry, the --help output, and the top of the class documentation. It must be:
- A complete sentence
- Accurate and specific
- Free of implementation detail
| Weak summary | Strong summary |
|---|---|
| ”Does mem2reg stuff." | "Promotes memory references to register references, eliminating alloca/load/store patterns that are amenable to SSA construction." |
| "Optimises loops." | "Performs loop-invariant code motion, moving computations whose operands do not change across loop iterations into the loop preheader.” |
Prerequisites and Requirements
Document prerequisites using “requires” and “assumes”:
- “This pass requires that the input IR is in SSA form. Run the mem2reg pass or equivalent before scheduling this pass.”
- “This pass assumes that function arguments do not alias any global variables. If this assumption may be violated, disable the pass for affected functions using the ‘noalias-args’ attribute.”
Preservation Statements
After a transform pass runs, some analyses remain valid and some are invalidated. Document this using “preserves” and “invalidates”:
- “This pass preserves the dominator tree, as it does not modify the control flow graph.”
- “This pass invalidates alias analysis results, as it may introduce new pointer-producing instructions.”
- “This pass does not modify the IR if no eligible patterns are found; in that case, all analyses are preserved.”
Documenting Limitations and Edge Cases
Limitations are one of the most important and most underwritten sections of pass documentation. Common patterns:
| Situation | Documentation phrase |
|---|---|
| The pass is conservative | ”This pass uses a conservative alias analysis and may fail to eliminate some provably safe patterns.” |
| The pass does not handle a case | ”This pass does not handle indirect calls. Function pointer calls are left unchanged.” |
| A known interaction with another pass | ”Running this pass after [X] may produce suboptimal results; schedule it before [X] for best effect.” |
| A performance cliff | ”The analysis has quadratic worst-case complexity in the number of pointer-producing instructions. For functions with more than 10,000 instructions, consider using the interprocedural alias analysis instead.” |
Documentation Patterns From the LLVM Codebase
Study these existing pass descriptions from the LLVM source as models:
- InstCombine: Combines instructions into more efficient forms; the description carefully distinguishes what it canonicalises versus what it optimises.
- LICM (Loop Invariant Code Motion): Documents the specific conditions under which a computation qualifies for hoisting.
- GVN (Global Value Numbering): Documents the relationship between value numbering and load elimination.
Each of these passes has a clear intent statement, explicit prerequisites, and documented limitations. Aim for the same.
Example LLVM Pass Documentation Sentences
- “This pass transforms switch instructions with contiguous integer case ranges into lookup tables, replacing an O(n) branch sequence with an O(1) memory access.”
- “The pass requires LoopAnalysis and ScalarEvolution to be available; it will not run on loops for which ScalarEvolution cannot compute a trip count.”
- “Induction variable simplification canonicalises all loop induction variables to start at zero and increment by one, simplifying subsequent vectorisation and loop unrolling passes.”
- “This analysis conservatively marks a pointer as ‘may alias’ when its alias relationship cannot be determined statically; pass authors who need a more precise result should request the AliasAnalysis with per-call-site context.”
- “This pass invalidates the MemorySSA analysis on any function it modifies; downstream passes that depend on MemorySSA must request a fresh analysis after this pass runs.”
Style and Register Notes
LLVM documentation uses a neutral, impersonal technical register. Avoid first person (“I wrote this pass to…”) and conversational asides (“basically, what this does is…”).
Prefer:
- Present tense for describing what the pass does: “This pass eliminates…”
- Conditional for edge cases: “If the trip count is unknown, the pass skips the loop.”
- Imperative for instructions to users: “Run this pass after mem2reg.”
The documentation will be read by engineers from many backgrounds and English proficiencies. Plain, precise, unambiguous English serves all of them better than idiomatic or colloquial prose.