📖 Chapters

Data Structures and Algorithms in C (DSA)

Category: IT Fundamentals & Programming

This course is designed to help learners master the core concepts of Data Structures and Algorithms (DSA) using the C programming language. Starting from the basics and advancing to complex topics like graphs, dynamic programming, and memory optimization, this course …

Greedy Algorithms (Activity Selection, Huffman Coding)

📘 Chapter: Greedy Algorithms (Activity Selection, Huffman Coding)

Imagine you're faced with a complex problem, and at each step, you need to make a decision. A Greedy Algorithm is an algorithmic paradigm that makes the locally optimal choice at each stage with the hope of finding a globally optimal solution. It builds a solution piece by piece, always choosing the next piece that offers the most obvious, immediate benefit. While this approach doesn't always yield the best solution for every problem, it works remarkably well for a specific class of problems and often provides a much simpler and faster solution than other paradigms like Dynamic Programming.

✨ Core Concepts of the Greedy Approach

For a problem to be solvable by a greedy algorithm, it typically needs to exhibit two key properties:

1. Greedy Choice Property: A globally optimal solution can be obtained by making a locally optimal (greedy) choice. This means that once a greedy choice is made, it doesn't need to be reconsidered or undone. The optimal solution can be constructed by a sequence of greedy choices.
2. Optimal Substructure: An optimal solution to the problem contains optimal solutions to its subproblems. This property is also shared with Dynamic Programming, but in greedy algorithms, the optimal solution to subproblems is *always* consistent with the greedy choice made.

A common technique to prove the correctness of a greedy algorithm is the **"Exchange Argument"**, where you show that if there exists an optimal solution that doesn't make the greedy choice, you can "exchange" some parts of it to incorporate the greedy choice without making the solution worse.

🗓️ 1. Activity Selection Problem

This is a classic example where a greedy approach works perfectly.

Problem Statement:

Given a set of $N$ activities, each with a start time $S_i$ and a finish time $F_i$, select the maximum number of non-overlapping activities that can be performed by a single person or resource. An activity `i` can be performed if $S_i \ge F_j$ (where $F_j$ is the finish time of the previously selected activity `j`).

Greedy Choice:

Always select the activity that **finishes earliest** among the remaining compatible activities.

Algorithm Steps:

Sort all activities in non-decreasing order of their **finish times**.
Select the first activity from the sorted list (as it finishes earliest, leaving maximum time for subsequent activities). Add it to your result set.
Iterate through the remaining activities:
- For each activity `i`: If its start time $S_i$ is greater than or equal to the finish time $F_j$ of the *last selected activity* `j`, then select activity `i` and update the "last selected activity" to `i`.

// Conceptual C++ for Activity Selection Problem
struct Activity {
    int start, finish;
};

// Custom comparator for sorting by finish time
bool compareActivities(const Activity& a, const Activity& b) {
    return a.finish < b.finish;
}

vector<Activity> select_activities(vector<Activity>& activities) {
    // Sort activities by finish time
    sort(activities.begin(), activities.end(), compareActivities);

    vector<Activity> result;
    if (activities.empty()) return result;

    // Select the first activity
    result.push_back(activities[0]);
    int last_finish_time = activities[0].finish;

    // Iterate through remaining activities
    for (size_t i = 1; i < activities.size(); ++i) {
        if (activities[i].start >= last_finish_time) {
            result.push_back(activities[i]);
            last_finish_time = activities[i].finish;
        }
    }
    return result;
}

Time Complexity: $O(N \log N)$ due to sorting. The iteration takes $O(N)$.
Space Complexity: $O(N)$ for storing activities and the result.

✉️ 2. Huffman Coding

Huffman Coding is a widely used data compression technique that leverages a greedy approach to build an optimal prefix code. A prefix code ensures that no code word is a prefix of another code word, allowing for unambiguous decoding.

Problem Statement:

Given a set of characters and their frequencies of occurrence, construct a binary tree (Huffman tree) to assign binary codes to each character such that the total encoded length (sum of `frequency * code_length` for all characters) is minimized.

Greedy Choice:

At each step, combine the two nodes (characters or internal nodes) with the **smallest frequencies**. This minimizes the impact of longer code lengths on high-frequency characters.

Algorithm Steps:

For each unique character, create a leaf node with its frequency.
Insert all these leaf nodes into a **min-priority queue**, ordered by frequency.
While the priority queue contains more than one node:
- Extract the two nodes with the smallest frequencies (let's call them `node1` and `node2`).
- Create a new internal node `newNode`. Its frequency is `node1.frequency + node2.frequency`. Make `node1` and `node2` its left and right children (order doesn't strictly matter for correctness, but conventionally smaller frequency as left child).
- Insert `newNode` back into the priority queue.
The single remaining node in the priority queue is the root of the Huffman Tree.
Traverse the Huffman Tree (e.g., DFS) to assign binary codes to each character. Assign '0' for a left branch and '1' for a right branch.

// Conceptual C++ for Huffman Coding
struct MinHeapNode {
    char data;
    int freq;
    MinHeapNode *left, *right;

    MinHeapNode(char data, int freq) {
        this->data = data;
        this->freq = freq;
        left = right = nullptr;
    }
};

// Custom comparator for min-priority queue
struct CompareNodes {
    bool operator()(MinHeapNode* a, MinHeapNode* b) {
        return a->freq > b->freq; // Min-heap based on frequency
    }
};

void printCodes(MinHeapNode* root, string str) {
    if (!root) return;

    if (root->data != '$') { // '$' can be used for internal nodes
        cout << root->data << ": " << str << endl;
    }

    printCodes(root->left, str + "0");
    printCodes(root->right, str + "1");
}

void huffman_coding(const vector<char>& chars, const vector<int>& freqs) {
    priority_queue<MinHeapNode*, vector<MinHeapNode*>, CompareNodes> minHeap;

    for (size_t i = 0; i < chars.size(); ++i) {
        minHeap.push(new MinHeapNode(chars[i], freqs[i]));
    }

    while (minHeap.size() > 1) {
        MinHeapNode* left = minHeap.top(); minHeap.pop();
        MinHeapNode* right = minHeap.top(); minHeap.pop();

        MinHeapNode* top = new MinHeapNode('$', left->freq + right->freq);
        top->left = left;
        top->right = right;
        minHeap.push(top);
    }

    printCodes(minHeap.top(), ""); // The root of the Huffman tree
}

Time Complexity: $O(N \log N)$, where $N$ is the number of unique characters. Each of $2N-1$ insertions/extractions from the priority queue takes $O(\log N)$ time.
Space Complexity: $O(N)$ for storing nodes in the priority queue and the tree structure.

🚫 When Not to Use Greedy (A Word of Caution)

While powerful, greedy algorithms are not universally applicable. A problem that looks like it could be solved greedily might actually require Dynamic Programming or other techniques if the "locally optimal" choice doesn't always lead to a "globally optimal" solution.

Example: 0/1 Knapsack Problem: If you're picking items for a knapsack with a weight limit, and you can only take an item entirely or not at all (0/1), a greedy choice (e.g., always picking the item with the highest value-to-weight ratio) doesn't guarantee an optimal solution. This problem requires Dynamic Programming. However, for the **Fractional Knapsack Problem** (where you can take parts of items), a greedy approach *does* work.

Always carefully verify if the greedy choice property and optimal substructure hold for a problem before committing to a greedy solution.

📌 Real-Life Example (UdaanPath)

Greedy algorithms are surprisingly prevalent in systems, especially for optimization tasks:

Resource Scheduling (Activity Selection): On UdaanPath, if there are limited instructors or virtual classrooms, and multiple classes/webinars are scheduled with start and end times, an activity selection algorithm could be used to maximize the number of parallel sessions or optimize resource utilization.
Content Compression (Huffman Coding): Any multimedia content (images, videos, text) on UdaanPath could use Huffman coding (often as part of larger compression schemes like JPEG, MP3, ZIP) to reduce file sizes, leading to faster loading times and reduced storage costs.
Caching Policies (Simplified): While often more complex, some simple caching strategies (e.g., "least recently used" or "most frequently used" with immediate eviction) can have greedy elements.
Shortest Path (Dijkstra's as Greedy): As we saw, Dijkstra's algorithm is itself a greedy algorithm! At each step, it greedily picks the unvisited vertex with the smallest known distance from the source.
Change-Making Problem: If you're designing a system to calculate change for a transaction, a greedy approach (always pick the largest denomination possible) works for standard currency systems (e.g., in INR: 2000, 500, 200, 100, 50, 20, 10, 5, 2, 1) to minimize the number of coins/notes.

Greedy algorithms offer efficient solutions to problems where a series of locally optimal decisions lead to a global optimum.

✅ Summary: Local Optima for Global Solutions

Greedy Algorithms make locally optimal choices hoping to achieve a globally optimal solution.
They work well for problems exhibiting Greedy Choice Property and Optimal Substructure.
Activity Selection: Maximize non-overlapping activities by sorting by earliest finish time and picking compatible ones. Time: $O(N \log N)$.
Huffman Coding: Minimize encoded length by repeatedly combining two lowest-frequency nodes using a min-priority queue. Time: $O(N \log N)$.
Crucially, not all optimization problems can be solved greedily; careful analysis or proof is required.

📤 Coming Next: Backtracking (N-Queens, Sudoku Solver)

From making locally optimal choices, we now move to a more exhaustive search strategy. Our next chapter will introduce Backtracking, a general algorithmic technique for finding all (or some) solutions to computational problems, typically constraint satisfaction problems. We'll explore classic examples like the N-Queens problem and Sudoku Solver to understand how backtracking systematically tries to build a solution and "undoes" (backtracks) choices that lead to a dead end.

← Bridges and Articulation Points Backtracking (N-Queens, Sudoku Solver) →