Learn the vocabulary of a structure that quickly answers connectivity and merge queries between groups.
0 / 5 completed
1 / 5
At standup, a dev mentions a data structure that quickly tells whether two elements belong to the same connected group and can merge two groups together, without ever needing to walk every element in either group to answer either question. What is this structure called?
A union-find, also called a disjoint-set structure, quickly answers whether two elements belong to the same group and can merge two groups together, both in close to constant time on average, without ever needing to walk through every element of either group to answer either kind of query. A trie is an unrelated structure for sharing prefix paths among strings, not for tracking group membership. This near-constant-time connectivity and merge capability is exactly why union-find is the standard structure behind problems like detecting a cycle while building a graph, or finding connected components in a network.
2 / 5
During a design review, the team adds path compression, so that every element visited while answering a query gets its pointer updated to point directly at the group's representative, flattening the structure over time. Which capability does path compression provide?
Path compression provides keeping later queries on those same elements close to constant time, since flattening each visited element's pointer to point almost directly at its group's representative means a later query on that same element no longer has to walk back up through a long chain of intermediate pointers. Leaving every pointer unchanged after each query would let a chain of merges grow into a long, slow-to-traverse structure over time, defeating the near-constant-time goal of union-find. This automatic flattening as a side effect of each query is exactly why path compression is considered essential to a practical union-find implementation, not just a nice-to-have optimization.
3 / 5
In a code review, a dev notices a union-find implementation always attaches the second group's representative underneath the first group's representative, regardless of which group is actually larger. What does this represent?
This is a union-find missing union-by-size or union-by-rank, since always attaching the second group underneath the first regardless of size risks repeatedly attaching a larger group underneath a smaller one, building a long, unbalanced chain that makes later queries slower than they need to be. A cache eviction policy is an unrelated concept about discarded cache entries. This is exactly why a well-implemented union-find always attaches the smaller group underneath the larger one's representative, which keeps the structure's depth small regardless of the order merges happen to occur in.
4 / 5
An incident report shows a network-connectivity service that used union-find to track connected components saw query times degrade sharply after a long sequence of merges, because the implementation always attached the second group underneath the first regardless of size, building a long chain. What practice would prevent this?
Adding union-by-size, so the smaller group's representative always gets attached underneath the larger group's, keeps the resulting structure's depth small no matter what order the merges happen to occur in, which is exactly the fix for the degradation described in this incident. Continuing to always attach the second group underneath the first regardless of actual size is exactly what let a long, unbalanced chain build up and slow down later queries. This union-by-size (or the closely related union-by-rank) technique, combined with path compression, is the standard pairing that keeps a union-find's queries close to constant time in practice.
5 / 5
During a PR review, a teammate asks why the team pairs path compression with union-by-size instead of relying on just one of the two optimizations alone. What is the reasoning?
Path compression flattens chains only as a side effect of a query actually walking through them, while union-by-size prevents those chains from growing long in the first place by always attaching the smaller group underneath the larger one during a merge. Used together, the structure both avoids building deep chains during merges and actively flattens whatever depth remains during queries, which is why the combination gives a provably better bound than either technique achieves on its own. The tradeoff is a small amount of extra bookkeeping, tracking each group's size, in exchange for query and merge operations that stay close to constant time even after a very long sequence of operations.