Regex in Code Review
Anchoring issues, flags, catastrophic backtracking, explaining patterns, and when not to use regex
Regex code review phrases
- "Not anchored" — add ^ and $ to match the full string, not just a substring
- /i flag — case-insensitive; /g flag — global (all matches, needed for replace-all)
- Catastrophic backtracking / ReDoS — nested quantifiers on overlapping classes can cause exponential time
- "Use the URL constructor / JSON.parse / DOM parser" — regex can't reliably parse nested structures
- Break down complex regex left-to-right; explain each component in the review comment
Question 0 of 5
A code review has the comment: "This regex is not anchored — it will match the pattern anywhere in the string, not just as a whole value." Which regex has this problem?
/\d{5}/ has no anchors — it matches 5 digits anywhere in the string, so "12345abc" or "hello12345" would pass validation. Anchoring vocabulary:
- ^ and $ — anchor to start and end; ensures the entire string matches the pattern
- Without anchors: "abc12345xyz" passes a ZIP regex that should only accept "12345"
- Security risk: unanchored validators can be bypassed by placing valid content within a longer malicious string
- "This regex needs ^ and $ anchors to validate the full string."
- "Without anchoring, this regex matches the pattern anywhere — use /^pattern$/ for strict validation."
During a code review, you see: const re = /user/ig;. What do the i and g flags mean?
i = case-insensitive; g = global (all matches). Regex flags vocabulary:
- /i — case-insensitive: "User", "USER", "user" all match /user/i
- /g — global: find all matches, not just the first; needed for replace-all and matchAll()
- /m — multiline: ^ and $ match start/end of each line, not just the whole string
- /s — dotAll:
.also matches newlines - /u — Unicode mode: handles Unicode characters and code points correctly
A code review comment says: "This regex has catastrophic backtracking vulnerability." What does this mean?
Exponential backtracking on non-matching input — potential ReDoS attack. Catastrophic backtracking vocabulary:
- ReDoS — Regular Expression Denial of Service; crafted input causes near-infinite backtracking
- Vulnerable pattern:
/(a+)+/or/(\w+\s?)+$/— nested quantifiers on the same character class - Why it happens: the engine tries all possible ways to split the input across groups before giving up
How would you explain this regex in a code review comment?/^(?:https?:\/\/)?(www\.)?[\w-]+\.[a-z]{2,}/i
Optional scheme + optional www + domain + TLD — a URL pattern. Breaking it down for a review comment:
^— anchored at start(?:https?:\/\/)?— optional non-capturing group: http:// or https:// (? makes s optional)(www\.)?— optional capturing group: literal "www."[\w-]+— domain name: word chars and hyphens\.[a-z]{2,}— literal dot + TLD of 2+ letters/i— case-insensitive
When would you recommend NOT using regex in a code review?
Don't use regex for nested structures or when a library already exists. When to recommend alternatives in code reviews:
- HTML parsing: use a DOM parser (DOMParser, BeautifulSoup, htmlparser2) — regex cannot correctly parse nested HTML
- JSON parsing: use JSON.parse() — never regex for JSON extraction
- Date/time parsing: use Temporal, date-fns, or moment — handles edge cases regex misses
- Email validation: use a library or simply check for @ and . — full RFC 5321 regex is 6KB+
- URL parsing: use the URL constructor —
new URL(input)throws on invalid input
try { new URL(input) } catch { ... }. It handles all edge cases and will never have backtracking issues."