📖 Chapters

Data Structures and Algorithms in C (DSA)

Category: IT Fundamentals & Programming

This course is designed to help learners master the core concepts of Data Structures and Algorithms (DSA) using the C programming language. Starting from the basics and advancing to complex topics like graphs, dynamic programming, and memory optimization, this course …

Counting Sort, Radix Sort, Bucket Sort

📘 Chapter 15: Beyond Comparisons: Counting Sort, Radix Sort, Bucket Sort

So far, we've explored comparison-based sorting algorithms like Bubble, Selection, Insertion, Merge, and Quick Sort. These algorithms sort elements by comparing them to each other. However, there's a powerful class of sorting algorithms known as non-comparison sorts that can achieve linear time complexity, O(n), under specific conditions. They do this by leveraging information about the distribution or structure of the input numbers themselves, rather than just their relative order. In this chapter, we'll dive into three prominent non-comparison sorts: Counting Sort, Radix Sort, and Bucket Sort.

📊 Counting Sort: Ideal for Small Ranges

Counting Sort is an efficient sorting algorithm for integers when the range of input numbers (the difference between the maximum and minimum values) is not significantly larger than the number of items to be sorted. It works by counting the frequency of each distinct element in the input array, and then using that information to place the elements in their correct sorted positions.

// Counting Sort in C
void countingSort(int arr[], int n) {
    // Find the maximum element to determine the range
    int max = arr[0];
    for (int i = 1; i < n; i++) {
        if (arr[i] > max)
            max = arr[i];
    }

    int count[max + 1]; // Create count array (size max + 1)
    int output[n];      // Create output array

    // Initialize count array with all zeros
    for (int i = 0; i <= max; ++i) {
        count[i] = 0;
    }

    // Store the count of each element
    for (int i = 0; i < n; i++) {
        count[arr[i]]++;
    }

    // Store cumulative count of each array (prefix sum)
    // This gives the actual position of elements in the output array
    for (int i = 1; i <= max; i++) {
        count[i] += count[i - 1];
    }

    // Place the elements in output array
    // Iterate from right to left to ensure stability
    for (int i = n - 1; i >= 0; i--) {
        output[count[arr[i]] - 1] = arr[i];
        count[arr[i]]--;
    }

    // Copy the output array to arr[], so that arr[] now contains sorted elements
    for (int i = 0; i < n; i++) {
        arr[i] = output[i];
    }
}

Time Complexity: O(n + k), where 'n' is the number of elements and 'k' is the range of input (max_value - min_value + 1). If 'k' is proportional to 'n' (i.e., k = O(n)), then the time complexity becomes O(n), which is linear.

Space Complexity: O(n + k) for the `count` and `output` arrays.

Counting Sort Example Visualization: `[1, 4, 1, 2, 7, 5, 2]` (Max value = 7)

Input Array:

1. Count Array:

Count occurrences of each number. E.g., '1' appears twice, '2' appears twice.

2. Cumulative Count:

Each cell now holds count of elements <= its index. This gives end positions.

3. Build Output Array:

Iterate input array from right. For `2` (last element): `count[2]` is 4. Place `2` at `output[4-1]=output[3]`. Decrement `count[2]` to 3.

Next element `5`: `count[5]` is 6. Place `5` at `output[5]`. Decrement `count[5]` to 5.

Continue this process...

Final Output:

The sorted array.

🔢 Radix Sort: Digit by Digit Efficiency

Radix Sort is a non-comparison sorting algorithm that sorts numbers by processing individual digits (or bits) from least significant to most significant (LSD Radix Sort) or vice versa (MSD Radix Sort). It requires a stable sorting algorithm (like Counting Sort) as a subroutine for sorting digits. This makes it particularly effective for integers or strings with a fixed length.

// Radix Sort in C (using Counting Sort as a subroutine)

// A utility function to get maximum value in arr[]
int getMax(int arr[], int n) {
    int max = arr[0];
    for (int i = 1; i < n; i++)
        if (arr[i] > max)
            max = arr[i];
    return max;
}

// A function to do counting sort of arr[] according to
// the digit represented by 'exp'.
void countSortForRadix(int arr[], int n, int exp) {
    int output[n]; // output array
    int i, count[10] = {0}; // 0-9 for digits

    // Store count of occurrences in count[]
    for (i = 0; i < n; i++)
        count[(arr[i] / exp) % 10]++;

    // Change count[i] so that count[i] now contains actual
    // position of this digit in output[]
    for (i = 1; i < 10; i++)
        count[i] += count[i - 1];

    // Build the output array
    for (i = n - 1; i >= 0; i--) {
        output[count[(arr[i] / exp) % 10] - 1] = arr[i];
        count[(arr[i] / exp) % 10]--;
    }

    // Copy the output array to arr[], so that arr[] now
    // contains sorted numbers according to current digit
    for (i = 0; i < n; i++)
        arr[i] = output[i];
}

// The main function to that sorts arr[] of size n using Radix Sort
void radixSort(int arr[], int n) {
    // Find the maximum number to know number of digits
    int m = getMax(arr, n);

    // Do counting sort for every digit. Note that instead of passing digit number,
    // exp is passed. exp is 10^i where i is current digit number
    for (int exp = 1; m / exp > 0; exp *= 10)
        countSortForRadix(arr, n, exp);
}

Time Complexity: O(d * (n + b)), where 'd' is the number of digits (or passes), 'n' is the number of elements, and 'b' is the base (radix) of the numbers (usually 10 for decimal numbers). If 'd' and 'b' are constants, this simplifies to O(n), making it linear.

Space Complexity: O(n + b) due to the auxiliary space used by Counting Sort (output array and count array).

Radix Sort Example Visualization: `[170, 45, 75, 90, 802, 24, 2, 66]`

Initial Array:

170

802

1. Sort by Units Digit (LSD):

Bucket 0

170

Bucket 1

Bucket 2

802

Bucket 3

Bucket 4

Bucket 5

Bucket 6

Bucket 7

Bucket 8

Bucket 9

170

802

Collect elements in order from buckets (0 to 9).

2. Sort by Tens Digit:

Bucket 0

802

Bucket 1

Bucket 2

Bucket 3

Bucket 4

Bucket 5

Bucket 6

Bucket 7

170

Bucket 8

Bucket 9

802

170

Collect elements based on their tens digit.

3. Sort by Hundreds Digit:

Bucket 0

Bucket 1

170

Bucket 2

...

Bucket 8

802

...

170

802

Collect elements based on their hundreds digit.

Final Sorted Array:

170

802

🗑️ Bucket Sort: Distributing into Bins

Bucket Sort, also known as bin sort, is a distribution sort algorithm that distributes elements of an array into a number of buckets. Each bucket is then sorted individually, either using a different sorting algorithm (like Insertion Sort, which is efficient for small inputs) or by recursively applying the bucket sort algorithm. Finally, the elements are gathered from the buckets in order. Bucket Sort works best when the input is uniformly distributed over a range.

// Bucket Sort in C (simplified for integers, real-world often uses floating points)
// This example assumes numbers are in range [0, 100] for simplicity.

#include <stdio.h>
#include <stdlib.h>
#include <string.h> // For memset

// Function to sort an array using insertion sort (for individual buckets)
void insertionSortForBucket(int arr[], int n) {
    int i, key, j;
    for (i = 1; i < n; i++) {
        key = arr[i];
        j = i - 1;
        while (j >= 0 && arr[j] > key) {
            arr[j + 1] = arr[j];
            j = j - 1;
        }
        arr[j + 1] = key;
    }
}

void bucketSort(int arr[], int n) {
    // 1. Create n buckets. For simplicity, let's assume max value is 100
    // and elements are distributed across these buckets.
    // In a real scenario, you'd determine the number of buckets and range per bucket
    // based on max_val and min_val of the array.
    int max_val = arr[0];
    for(int i=1; i max_val) max_val = arr[i];
    }

    // Let's create 'num_buckets' buckets.
    // A common approach is num_buckets = n or sqrt(n)
    // For integers, usually elements are mapped to buckets using (value * num_buckets) / (max_val + 1)
    int num_buckets = 10; // Example: 10 buckets
    if (n > 0) num_buckets = (max_val / n) + 1; // A more dynamic way to determine buckets

    // Create an array of vectors (or linked lists) for buckets
    // This is a common way to represent dynamic arrays in C, but typically
    // would be done with dynamic memory or standard library containers in C++
    // For simplicity, let's use a 2D array and track sizes. This is not ideal for real-world.
    // In a production C environment, you'd use dynamic arrays (e.g., `struct Vector { int* data; int size; int capacity; }`).
    // For demonstration, let's use a simplified approach assuming max items per bucket.
    // THIS IS A SIMPLIFIED, ILLUSTRATIVE C IMPLEMENTATION FOR THE CONCEPT.
    // A robust C implementation would be more complex (e.g., using linked lists for buckets).

    // Simulate buckets using a simple array of arrays (fixed size for illustration)
    // In real C, you'd typically use linked lists or dynamically resizing arrays for buckets.
    int buckets[num_buckets][n]; // Max n elements per bucket
    int bucket_sizes[num_buckets];
    memset(bucket_sizes, 0, sizeof(bucket_sizes)); // Initialize all sizes to 0

    // 2. Put array elements into appropriate buckets
    for (int i = 0; i < n; i++) {
        int bucket_idx = (arr[i] * num_buckets) / (max_val + 1); // Map to bucket index
        buckets[bucket_idx][bucket_sizes[bucket_idx]++] = arr[i];
    }

    // 3. Sort individual buckets
    for (int i = 0; i < num_buckets; i++) {
        insertionSortForBucket(buckets[i], bucket_sizes[i]);
    }

    // 4. Concatenate all buckets into arr[]
    int index = 0;
    for (int i = 0; i < num_buckets; i++) {
        for (int j = 0; j < bucket_sizes[i]; j++) {
            arr[index++] = buckets[i][j];
        }
    }
}

Time Complexity: O(n + k), where 'n' is the number of elements and 'k' is the number of buckets. If elements are uniformly distributed, making the bucket sorting efficient (e.g., O(1) or small constant), then it approaches O(n).

Space Complexity: O(n + k) for the buckets.

Bucket Sort Example Visualization: `[0.89, 0.56, 0.65, 0.12, 0.69, 0.45]` (Numbers between 0.0 and 1.0, 10 buckets)

Initial Array (Floating point numbers 0.0-1.0):

0.89

0.56

0.65

0.12

0.69

0.45

1. Distribute into 10 Buckets (index = floor(value * 10)):

Bucket 0

Bucket 1

0.12

Bucket 2

Bucket 3

Bucket 4

0.45

Bucket 5

0.56

Bucket 6

0.65

0.69

Bucket 7

Bucket 8

0.89

Bucket 9

Elements are placed into buckets based on their first digit after the decimal (e.g., 0.89 goes to bucket 8).

2. Sort Each Bucket (using Insertion Sort):

Bucket 0

Bucket 1

0.12

Bucket 2

Bucket 3

Bucket 4

0.45

Bucket 5

0.56

Bucket 6

0.65

0.69

Bucket 7

Bucket 8

0.89

Bucket 9

Individual buckets are sorted. For bucket 6, 0.65 and 0.69 are already in order.

3. Concatenate Sorted Buckets:

0.12

0.45

0.56

0.65

0.69

0.89

Elements collected sequentially from bucket 0 to bucket 9.

⚡ Advantages and Limitations of Non-Comparison Sorts

These algorithms offer a compelling alternative to comparison sorts:

Potential for O(n) Complexity: Their biggest advantage is the ability to sort in linear time under specific conditions, outperforming the O(n log n) lower bound of comparison sorts.
Counting Sort: Excellent for integers within a small, well-defined range. It's stable.
Radix Sort: Efficient for integers, especially when they have a fixed number of digits or a predictable maximum number of digits. It is also stable if the subroutine sorting (like Counting Sort) is stable.
Bucket Sort: Works well for uniformly distributed data, especially floating-point numbers. Its performance degrades if data is clustered. It can be stable depending on the sorting algorithm used for buckets.

However, they come with crucial limitations:

Data Type Restrictions: Primarily designed for integers or data that can be mapped to integers (like characters in Radix sort). They are not general-purpose for arbitrary data types.
Range/Distribution Dependence:
- Counting Sort's efficiency degrades if the range 'k' is very large (e.g., sorting 3 numbers between 1 and 1 Billion).
- Radix Sort's efficiency depends on the number of digits/passes.
- Bucket Sort's efficiency relies heavily on uniform data distribution. If all elements fall into one bucket, it degenerates to the performance of the bucket's internal sorting algorithm.
Higher Space Complexity: They often require significant auxiliary space for count arrays, buckets, or output arrays (O(n+k) or O(n+b)).

📌 Real-Life Example (UdaanPath)

These non-comparison sorts, given their specialized nature, find practical applications in specific niches within a platform like UdaanPath:

Counting Sort for Quiz Scores: If UdaanPath hosts quizzes where scores are integers within a fixed, relatively small range (e.g., 0-100 points), Counting Sort could be incredibly fast for ranking students or tallying score distributions. For example, if 10,000 students take a quiz scored out of 100, Counting Sort would sort their scores in O(10,000 + 100) = O(10,100) time, which is practically instantaneous.
Radix Sort for User IDs/Course Codes: If UdaanPath assigns fixed-length alphanumeric user IDs or course codes (e.g., "UD12345", "DS98765"), Radix Sort could be used for extremely fast sorting of these identifiers, especially when dealing with millions of records. It's also efficient for IP addresses, which are essentially numbers.
Bucket Sort for Performance Metrics: For analyzing student performance metrics that are normalized to a scale (e.g., overall skill level from 0.0 to 1.0, or percentile ranks), Bucket Sort could efficiently group students into performance tiers. If student skill levels are evenly distributed, this provides a quick way to segment the user base for targeted educational content or interventions.

These algorithms are powerful tools when their specific input constraints are met, offering performance that comparison sorts simply cannot match.

✅ Summary: Specialized Sorting Powerhouses

Non-comparison sorts leverage specific properties of data to sort, potentially achieving O(n) time complexity.
Counting Sort is excellent for integers within a small, known range.
Radix Sort sorts by processing digits/bits, ideal for integers and fixed-length strings, often using Counting Sort as a stable subroutine.
Bucket Sort distributes elements into bins and sorts each bin, performing best with uniformly distributed data.
While highly efficient under their specific conditions, their applicability is limited by data type, range, or distribution constraints, and they generally require more auxiliary space.

📤 Coming Next: Heap Sort - An In-Place O(n log n) Solution

We've seen the power of both comparison-based and non-comparison based sorts. In the next chapter, we'll explore Heap Sort, another efficient comparison-based sorting algorithm that leverages a specialized data structure called a "Heap" to achieve an O(n log n) time complexity with the significant advantage of being an in-place sort.

← Quick Sort Heap Sort →