📖 Chapters

Data Structures and Algorithms in C (DSA)

Category: IT Fundamentals & Programming

This course is designed to help learners master the core concepts of Data Structures and Algorithms (DSA) using the C programming language. Starting from the basics and advancing to complex topics like graphs, dynamic programming, and memory optimization, this course …

Disjoint Set Union (Union-Find, Path Compression)

📘 Chapter: Disjoint Set Union (Union-Find, Path Compression)

Imagine you're managing a social network, and you need to quickly figure out if two people are in the same friend group, or if two friend groups merge. Or, perhaps you're building a game and need to detect if two objects are connected. This is where the Disjoint Set Union (DSU) data structure, often called Union-Find, comes to the rescue! It's a surprisingly simple yet incredibly powerful data structure for keeping track of a set of elements partitioned into a number of disjoint (non-overlapping) subsets.

🤔 What is DSU? The Core Idea

At its heart, DSU manages a collection of elements grouped into various disjoint sets. It provides two highly efficient operations to manipulate these sets:

Find (or `findSet`): Determines which set an element belongs to. It returns a "representative" (or "root") of that set. If two elements have the same representative, they are in the same set.
Union (or `unionSets`): Merges two sets into a single set. If two elements are in different sets, their sets are combined.

Think of it like managing countries: each person belongs to one country (`Find`). If two countries decide to merge, they become one larger country (`Union`).

🏗️ Representation: The Parent Array

A DSU structure is typically implemented using an array, let's call it `parent[]`. Each element `i` in the array stores the parent of element `i`. If `parent[i] == i`, then `i` is the root (representative) of its own set.

// C++ conceptual implementation
int parent[MAX_ELEMENTS];
int set_size[MAX_ELEMENTS]; // Optional: to store size/rank for optimization

// 💡 makeSet(i): Initializes element 'i' into its own set
void makeSet(int i) {
    parent[i] = i;      // 'i' is its own parent (root of its set)
    set_size[i] = 1;    // Size of this new set is 1
}

Initially, every element is in its own set, so `parent[i]` is initialized to `i` for all elements.

⚡ Optimizations: The Keys to Speed!

Without optimizations, `Find` and `Union` operations could take $O(N)$ time in the worst case (imagine a skewed tree, like a linked list). To achieve near-constant time performance, two powerful optimizations are used:

1. Path Compression (for `Find`)

When `find(i)` is called, it traverses up the parent pointers until it reaches the root. Path compression optimizes this by making every node visited during the `Find` operation directly point to the root. This "flattens" the tree, dramatically reducing the height of the elements in the set for future `Find` operations.

// Find operation with Path Compression
int findSet(int i) {
    if (parent[i] == i) // If 'i' is the root of its set
        return i;
    // Otherwise, recursively find the root and make 'i' point directly to it
    return parent[i] = findSet(parent[i]);
}

2. Union by Rank or Union by Size (for `Union`)

When merging two sets, we want to keep the resulting tree as flat as possible.

Union by Rank: Attaches the root of the "shorter" tree (tree with smaller rank/height) to the root of the "taller" tree. Rank is an upper bound on height.
Union by Size: Attaches the root of the "smaller" tree (tree with fewer elements) to the root of the "larger" tree. This helps balance the trees by ensuring the larger tree remains the parent, keeping overall tree depth minimal.

Both strategies aim to minimize the maximum depth of any node, ensuring `Find` operations remain fast. Union by size is often slightly easier to implement.

// Union operation with Union by Size
void unionSets(int a, int b) {
    a = findSet(a); // Find root of 'a'
    b = findSet(b); // Find root of 'b'

    if (a != b) { // If they are in different sets
        // Attach smaller set to larger set's root
        if (set_size[a] < set_size[b])
            swap(a, b); // Ensure 'a' is the larger set's root

        parent[b] = a;       // 'b' (smaller) now points to 'a' (larger)
        set_size[a] += set_size[b]; // Update size of the new combined set
    }
}

Amortized Analysis: Nearly O(1)!

When both Path Compression and Union by Rank/Size are used, the amortized time complexity for a sequence of $M$ `Find` and `Union` operations on $N$ elements is incredibly efficient: $O(M \alpha(N))$, where $\alpha(N)$ is the inverse Ackermann function. This function grows extremely slowly, so for all practical purposes, $\alpha(N)$ is less than 5. This means DSU operations are effectively constant time on average!

🌳 DSU Visualization: Union & Find with Path Compression

Disjoint Set Union (Union by Size & Path Compression)

Initial State: Each element is its own set

After Union(D, E), Union(B, C), Union(A, D):

After find(E) with Path Compression:

Initial: Each element is its own set. After Unions: Elements D & E are in one set (rooted at D), B & C in another (rooted at B), and A joins the set of D (rooted at A, assuming Union by Size). After Find(E) with Path Compression: Elements D and E now directly point to the root A, flattening the tree for faster future queries.

🚀 Advantages and Disadvantages of DSU

Advantages:

Extremely Fast: Near O(1) amortized time for `Find` and `Union` operations with optimizations. This is almost as fast as it gets!
Efficient Connectivity: Ideal for quickly determining connected components in a graph or grouping elements.
Simple Implementation: Despite its power, the core logic is quite concise and elegant.
Space Efficient: O(N) space to store the parent array and optionally size/rank.

Disadvantages:

Limited Operations: Primarily designed for `Find` and `Union`. Does not support arbitrary queries like finding elements in a specific range or removing elements from a set.
Cannot Undoing Operations: Once sets are unioned, it's not straightforward to split them back into their original disjoint sets.

📌 Real-Life Example (UdaanPath)

DSU might seem abstract, but it has powerful applications, especially in competitive programming and scenarios where connectivity is key. On a platform like UdaanPath, here's how it could be creatively applied:

Social Learning Groups: Imagine students forming study groups. DSU could quickly tell if two students are part of the same extended study network (even if not directly connected), and efficiently merge groups if they decide to collaborate.
Course Prerequisites / Module Connectivity: If courses or modules have complex interdependencies, DSU could help determine if a student has completed all necessary prerequisites for a advanced topic by checking if all prerequisite modules are "connected" in their completed list.
Community Moderation: Identifying "clusters" of problematic users who are frequently reported together, allowing moderators to quickly see if new reports are linked to existing problematic groups.
Network Connectivity Checks: In an internal system modeling user devices or microservices, DSU could quickly check if two components are connected or if adding a new connection creates a cycle. This is foundational for algorithms like Kruskal's for Minimum Spanning Tree.

DSU's efficiency in managing dynamic sets makes it a hidden gem for problems that revolve around group membership and connectivity.

✅ Summary: Master of Disjoint Sets

Disjoint Set Union (DSU) manages non-overlapping sets and performs `Find` (find representative) and `Union` (merge sets) operations.
It's typically implemented using a parent array.
Key optimizations are Path Compression (for `Find`) and Union by Rank/Size (for `Union`), leading to near O(1) amortized time complexity.
DSU is invaluable for problems involving connected components in graphs and grouping elements efficiently.

📤 Coming Next: Sparse Tables

Our next adventure takes us into the realm of Sparse Tables. This powerful data structure is designed for answering Range Query problems (like Range Minimum Query or Range Maximum Query) on a static array incredibly fast – in just O(1) time after an O(N log N) preprocessing step. We'll explore its structure and how it leverages dynamic programming for lightning-fast queries.

← Hashing and Hash Tables Sparse Tables →