📖 Chapters

Data Structures and Algorithms in C (DSA)

Category: IT Fundamentals & Programming

This course is designed to help learners master the core concepts of Data Structures and Algorithms (DSA) using the C programming language. Starting from the basics and advancing to complex topics like graphs, dynamic programming, and memory optimization, this course …

Minimum Spanning Tree (Kruskal, Prim)

📘 Chapter: Minimum Spanning Tree (Kruskal, Prim)

Imagine you need to connect several cities with a network of roads, lay down electrical cables, or design a communication network, and you want to do so using the least amount of material or cost. This is where the concept of a Minimum Spanning Tree (MST) comes into play. An MST is a subgraph of an undirected, connected, and weighted graph that connects all the vertices together with the minimum possible total edge weight and without any cycles.

🌳 Key Concepts: Spanning Tree & MST

Connected Graph: A graph where there is a path between every pair of vertices.
Spanning Tree: A subgraph of a connected, undirected graph that is a tree and includes all the vertices of the original graph.
- It must contain all $V$ vertices.
- It must be a tree (no cycles).
- It must be connected.
- It will always have exactly $V-1$ edges.
Minimum Spanning Tree (MST): A spanning tree whose sum of edge weights is the minimum possible among all possible spanning trees for a given graph.

Two classic algorithms solve the MST problem: **Kruskal's Algorithm** and **Prim's Algorithm**. Both are greedy algorithms, meaning they make locally optimal choices in the hope of finding a global optimum.

🌱 1. Kruskal's Algorithm

Kruskal's algorithm builds the MST by adding edges in increasing order of their weights, as long as adding the edge does not form a cycle. It uses a **Disjoint Set Union (DSU)** data structure (also known as Union-Find) to efficiently detect cycles.

Kruskal's Algorithm Steps:

Create a list of all edges in the graph, each with its weight.
Sort all edges in non-decreasing order of their weights.
Initialize a Disjoint Set Union (DSU) data structure where each vertex is initially in its own separate set.
Initialize an empty list/set to store the edges of the MST.
Iterate through the sorted edges:
- For each edge `(u, v)` with weight `w`:
  - Check if vertices `u` and `v` are already in the same set using `find()` operation of DSU.
  - If they are **not** in the same set (meaning adding this edge won't form a cycle):
    - Add the edge `(u, v)` to your MST list.
    - Union the sets containing `u` and `v` using `union_sets()` operation of DSU.
Stop when $V-1$ edges have been added to the MST (where $V$ is the number of vertices), or when all edges have been processed.

// Conceptual C++ for Kruskal's Algorithm with DSU
struct Edge {
    int u, v, weight;
    bool operator<(const Edge& other) const {
        return weight < other.weight;
    }
};

// DSU Structure (simplified for brevity)
struct DSU {
    vector<int> parent;
    DSU(int n) {
        parent.resize(n);
        iota(parent.begin(), parent.end(), 0); // parent[i] = i
    }
    int find(int i) {
        if (parent[i] == i) return i;
        return parent[i] = find(parent[i]);
    }
    void unite(int i, int j) {
        int root_i = find(i);
        int root_j = find(j);
        if (root_i != root_j) {
            parent[root_i] = root_j;
        }
    }
};

vector<Edge> kruskal_mst(int V, vector<Edge>& all_edges) {
    sort(all_edges.begin(), all_edges.end()); // Sort edges by weight
    DSU dsu(V);
    vector<Edge> mst_edges;
    int edges_count = 0;

    for (const auto& edge : all_edges) {
        if (dsu.find(edge.u) != dsu.find(edge.v)) {
            mst_edges.push_back(edge);
            dsu.unite(edge.u, edge.v);
            edges_count++;
            if (edges_count == V - 1) break; // MST found
        }
    }
    return mst_edges;
}

Kruskal's Time & Space Complexity:

Time:
- Sorting edges: $O(E \log E)$ (since $E \le V^2$, $E \log E$ is roughly $E \log V$ for dense graphs, but $E \log E$ is typically more precise).
- DSU operations: $O(E \alpha(V))$, where $\alpha$ is the inverse Ackermann function, which is practically a very small constant.
- Total: $O(E \log E)$ or $O(E \log V)$ (since $E \log E \approx E \log V^2 = 2E \log V$).
Space: $O(V + E)$ for storing edges and DSU structures.

🌲 2. Prim's Algorithm

Prim's algorithm builds the MST by starting from an arbitrary vertex and growing the tree edge by edge. At each step, it adds the cheapest edge that connects a vertex already in the MST to a vertex outside the MST. It is similar in concept to Dijkstra's algorithm and uses a **Min-Priority Queue**.

Prim's Algorithm Steps:

Choose an arbitrary starting vertex (e.g., vertex 0).
Initialize `key[v] = infinity` for all vertices `v`, and `key[start_node] = 0`. `key[v]` stores the minimum weight of an edge connecting `v` to the current MST.
Initialize `parent[v] = -1` for all `v` (to reconstruct the MST).
Create a min-priority queue and insert `(0, start_node)` (key weight, vertex).
Maintain a boolean array `inMST[V]` to track vertices already included in MST.
While the priority queue is not empty:
- Extract the vertex `u` with the smallest `key[u]` from the priority queue.
- Mark `u` as `inMST[u] = true`.
- If `parent[u]` is not -1 (i.e., `u` is not the starting node), add the edge `(parent[u], u)` to your MST list.
- For each neighbor `v` of `u` connected by an edge `(u, v)` with weight `w`:
  - If `v` is **not** in `inMST` AND `w < key[v]`:
    - Update `key[v] = w`.
    - Set `parent[v] = u`.
    - Insert/update `(key[v], v)` in the priority queue.

// Conceptual C++ for Prim's Algorithm using std::priority_queue
// Adjacency list: vector> adj[V]; // {neighbor, weight}

vector<pair<int, int>> prim_mst(int start_node, int V, const vector<vector<pair<int, int>>>& adj) {
    vector<int> key(V, INT_MAX); // Stores min weight to connect to MST
    vector<int> parent(V, -1); // Stores parent in MST
    vector<bool> inMST(V, false); // True if vertex is in MST

    priority_queue<pair<int, int>, vector<pair<int, int>>, greater<pair<int, int>>> pq; // {weight, vertex}

    key[start_node] = 0;
    pq.push({0, start_node});

    vector<pair<int, int>> mst_edges; // Store edges of the MST

    while (!pq.empty()) {
        int u_weight = pq.top().first; // current edge weight
        int u = pq.top().second;      // current vertex
        pq.pop();

        if (inMST[u]) continue; // Already processed
        inMST[u] = true;

        if (parent[u] != -1) { // If not the starting node
            mst_edges.push_back({parent[u], u}); // Add edge to MST
        }

        for (auto& edge : adj[u]) {
            int v = edge.first;
            int weight = edge.second;

            if (!inMST[v] && weight < key[v]) {
                key[v] = weight;
                parent[v] = u;
                pq.push({key[v], v});
            }
        }
    }
    return mst_edges;
}

Prim's Time & Space Complexity:

Time: $O(E \log V)$ or $O(E + V \log V)$ with `std::priority_queue` (binary heap). If a Fibonacci heap is used, it can be $O(E + V \log V)$.
Space: $O(V + E)$ for adjacency list, key/parent arrays, and priority queue.

⚖️ Kruskal vs. Prim: A Comparison

Feature	Kruskal's Algorithm	Prim's Algorithm
Approach	Edge-based (adds cheapest non-cyclic edge)	Vertex-based (grows tree from a start vertex)
Data Structure	Disjoint Set Union (DSU)	Min-Priority Queue
Graph Type	Works on disconnected components, useful for finding MST of each component.	Assumes connected graph; if disconnected, finds MST of the component containing the starting vertex.
Best For	Sparse graphs (fewer edges, $E \approx V$)	Dense graphs (many edges, $E \approx V^2$)

Both algorithms correctly find an MST. The choice often depends on the graph's density: Kruskal's is faster for sparse graphs, while Prim's (with an efficient priority queue) is better for dense graphs.

📌 Real-Life Example (UdaanPath)

MST algorithms are fundamental to problems where you need to connect entities with minimal cost or length:

Network Design: Imagine UdaanPath having multiple data centers or server clusters around the world. To establish communication links between them with the minimum total cable length (cost), an MST algorithm could determine the optimal cabling layout.
Sensor Network Deployment: If UdaanPath were deploying a network of IoT sensors across a campus, and each possible connection between sensors has an installation cost (distance), an MST would help connect all sensors with the least total wiring.
Clustering Analysis (Simplified): In data science, MST concepts can be used in some forms of clustering. For example, if data points are vertices and the distance between them is edge weight, an MST could help identify natural clusters by removing the most expensive edges, splitting the graph into components.
Minimum Cost Telecommunications: Designing the backbone of a telecommunication network where cities are vertices and potential cable routes are edges with costs, MST algorithms ensure all cities are connected with minimal infrastructure investment.
Circuit Design: In designing integrated circuits, MST can be used to lay out connections (wires) between components on a chip in a way that minimizes wire length, reducing material cost and signal delay.

MST algorithms ensure efficient and cost-effective connectivity across various domains.

✅ Summary: Connecting with Minimum Cost

A Minimum Spanning Tree (MST) connects all vertices in a weighted, connected, undirected graph with minimum total edge weight and no cycles.
Kruskal's Algorithm:
- Edge-based: Sorts all edges by weight and adds non-cyclic edges using DSU.
- Time: $O(E \log E)$. Space: $O(V+E)$.
- Good for sparse graphs.
Prim's Algorithm:
- Vertex-based: Grows tree from a starting vertex, using a min-priority queue to find the cheapest connecting edge.
- Time: $O(E \log V)$. Space: $O(V+E)$.
- Good for dense graphs.
Both are greedy algorithms that guarantee finding an MST.

📤 Coming Next: Cycle Detection in Graphs

Having explored ways to connect graphs with minimum cost, our next crucial topic delves into identifying fundamental structures within them: Cycles. We'll learn how to detect cycles in both undirected and directed graphs, a skill vital for validating graph properties, ensuring data integrity, and solving various algorithmic problems.

← Shortest Path Algorithms (Dijkstra, Bellman-Ford) Cycle Detection in Graphs →