Build fluency in the vocabulary of finding the longest shared, in-order sequence of characters between two strings.
0 / 5 completed
1 / 5
At standup, a dev mentions finding the longest sequence of characters that appears, in order but not necessarily contiguously, in both of two given strings, by building up a table of best answers for every pair of prefixes. What is this problem called?
Longest common subsequence (LCS) is exactly this: the longest common subsequence problem finds the longest sequence of characters that appears, in the same relative order but not necessarily contiguously, in both of two given strings, and its standard dynamic-programming solution builds up a table of the best answer for every pair of prefixes of the two strings. A hash collision is an unrelated hash-table concept about two keys sharing a bucket. This build-up-over-every-prefix-pair approach is exactly why the longest common subsequence can be found in time proportional to the product of the two strings' lengths, instead of checking every possible subsequence.
2 / 5
During a design review, the team computes the longest common subsequence between two versions of a document to generate a diff, specifically because building up a table over every pair of prefixes lets the algorithm reuse already-solved smaller sub-problems instead of checking every possible subsequence of either string. Which capability does this provide?
Longest common subsequence (LCS) here provides Efficient diff computation by reusing overlapping prefix-pair sub-problems, since the best answer for a given pair of prefixes only depends on best answers already computed for slightly shorter prefixes, so dynamic programming solves each distinct sub-problem once and reuses it, instead of checking every possible subsequence of either string. Checking every possible subsequence of either string costs exponential time as the strings get longer. This reuse-of-overlapping-prefix-sub-problems behavior is exactly why the longest common subsequence underlies practical, fast diff tools for text and version control.
3 / 5
In a code review, a dev notices a document-diffing feature checks every possible subsequence of both documents individually to find their longest shared subsequence, instead of building up a table of best answers over every pair of prefixes. What does this represent?
This is a missed longest-common-subsequence opportunity, since building up a table over every pair of prefixes, the way the longest common subsequence's dynamic-programming solution does, would reuse overlapping sub-problems instead of checking every possible subsequence. A cache eviction policy is an unrelated concept about discarded cache entries. This brute-force-enumeration pattern is exactly the kind of avoidable exponential cost a reviewer flags once the documents are long enough for it to matter.
4 / 5
An incident report shows a document-diffing feature's running time exploded on moderately long documents, because it checked every possible subsequence individually instead of building up a table of best answers over prefix pairs. What practice would prevent this?
Rewriting the diff logic using the longest common subsequence's dynamic-programming approach eliminates the exponential enumeration cost. Continuing to check every possible subsequence of two documents individually regardless of how long the documents being diffed are is exactly what caused the issue described in this incident. This dynamic-programming approach is the standard fix once brute-force subsequence enumeration is shown to be the actual bottleneck.
5 / 5
During a PR review, a teammate asks why the team reaches for the longest common subsequence instead of the edit-distance algorithm, given that edit distance also compares two strings with a similar dynamic-programming table. What is the reasoning?
The longest common subsequence finds the longest shared, in-order sequence of characters between two strings, which is exactly what a line-based diff tool needs to highlight unchanged content, while edit distance instead counts the minimum number of insertions, deletions, and substitutions needed to transform one string into the other, a related but different question better suited to a typo-tolerance score. This is exactly why the two dynamic-programming techniques, despite similar tables, serve different practical purposes.