Reading Advanced Regex
Capturing groups, alternation, lookaheads, word boundaries, and greedy vs lazy quantifiers
Advanced regex vocabulary
- Capturing group () — captures submatch; (?:...) = non-capturing; (?<name>...) = named
- | alternation — matches one full alternative (cat|dog); [...] = single character from a set
- (?=...) positive lookahead; (?!...) negative; zero-width (doesn't consume chars)
- \\b — word boundary; ensures match is a whole word, not part of a larger word
- Greedy vs lazy:
+grabs as much as possible;+?grabs as little as possible
Question 0 of 5
What does the capturing group do in this regex? /^(\d{4})-(\d{2})-(\d{2})$/
Capturing groups capture submatches accessible by index. Group vocabulary:
- Capturing group () — captures the matched text; accessible as match[1], match[2], etc.
- Non-capturing group (?:...) — groups without capturing:
(?:\d{2}) - Named group (?<name>...) — access by name:
(?<year>\d{4})→ match.groups.year
(?<year>\d{4}) is more readable than match[1] when you're extracting date parts."What is the difference between /cat|dog/ and /[cd]at/?
| is full-string alternation; [] is single-character class. Alternation vs character class:
- | (alternation) — matches one whole alternative:
cat|dogmatches "cat" or "dog";foo|bar|bazfor three options - [] (character class) — matches exactly one character from the set:
[cd]matches "c" or "d" only;[aeiou]= any vowel - [^...] — negated class: any character NOT in the set
gr(a|e)y matches "gray" or "grey"; gr[ae]y does the same (equivalent when each alternative is one character). But cat|dog ≠ [catdog] (the class matches one letter from c/a/t/d/o/g).A regex uses a lookahead: /\d+(?= dollars)/. What does this match?
Digits followed by " dollars" — match includes only the digits (lookahead is zero-width). Lookahead/lookbehind vocabulary:
- (?=...) — positive lookahead: asserts what follows, does not consume characters
- (?!...) — negative lookahead: asserts what does NOT follow
- (?<=...) — positive lookbehind: asserts what precedes (not available in all engines)
- (?<!...) — negative lookbehind: asserts what does NOT precede
What does \b (word boundary) do in this regex? /\btest\b/
\b matches a word boundary — ensures "test" appears as a complete word. Word boundary vocabulary:
- \b — zero-width assertion at the boundary between a word character (\w) and a non-word character (\W or start/end of string)
/\btest\b/matches: "test", "test.", "the test here" → "test" in each/\btest\b/does NOT match: "testing", "contest", "retest" (because "test" is surrounded by word chars)- \B — non-word boundary (opposite of \b)
A regex is described as "greedy". What does greedy vs lazy mean?
Greedy = match as much as possible; lazy (add ?) = match as little as possible. Greedy vs lazy vocabulary:
- Greedy:
<.+>on<div>hello</div>matches the entire string<div>hello</div> - Lazy:
<.+?>matches only<div>— stops at the first >
.+ here is greedy and will match across multiple tags. Change to .+? for lazy matching, or better, use a specific character class like [^>]+." Character class solution is often more explicit than relying on lazy quantifiers.