📖 Chapters

Data Structures and Algorithms in C (DSA)

Category: IT Fundamentals & Programming

This course is designed to help learners master the core concepts of Data Structures and Algorithms (DSA) using the C programming language. Starting from the basics and advancing to complex topics like graphs, dynamic programming, and memory optimization, this course …

Trie (Prefix Tree)

📘 Chapter: Trie (Prefix Tree)

Moving beyond numerical data, we now delve into a specialized tree-like data structure optimized for storing and querying collections of strings: the Trie, pronounced "try" (from retrieval), also known as a Prefix Tree. Unlike binary search trees which store keys at nodes, a Trie stores characters at nodes, and paths from the root to a node represent prefixes of words or complete words. This structure makes Tries exceptionally efficient for operations involving prefixes, such as autocomplete, spell checkers, and IP routing.

🌳 What is a Trie? Character by Character

A Trie is a tree data structure that efficiently stores and retrieves keys in a dynamic set or associative array, where the keys are typically strings.

Each node in a Trie represents a single character or a part of a key.
The root node typically represents an empty string.
Each child pointer of a node corresponds to a character.
Nodes can optionally have a boolean flag (e.g., `isEndOfWord`) to indicate if the path to that node forms a complete word.
All children of a node share a common prefix (the string represented by the path from the root to their parent node).

This structure minimizes redundant storage of prefixes and enables very fast prefix-based searches.

🔧 Core Operations on a Trie

1. Insertion (Insert)

To insert a word into a Trie, we traverse the tree character by character.

Start from the root.
For each character in the word:
- If the character's corresponding child node exists, move to that child.
- If not, create a new node for that character and link it, then move to the new node.
Once all characters are processed, mark the last node as `isEndOfWord = true`.

// Trie Node structure (Conceptual)
#define ALPHABET_SIZE 26 // For lowercase English letters

struct TrieNode {
    struct TrieNode* children[ALPHABET_SIZE];
    int isEndOfWord; // 1 if this node represents the end of a word, 0 otherwise
};

// Function to create a new Trie node
struct TrieNode* createTrieNode() {
    struct TrieNode* pNode = (struct TrieNode*)malloc(sizeof(struct TrieNode));
    if (pNode) {
        pNode->isEndOfWord = 0;
        for (int i = 0; i < ALPHABET_SIZE; i++)
            pNode->children[i] = NULL;
    }
    return pNode;
}

// Insert operation for Trie
void insertTrie(struct TrieNode* root, const char* key) {
    struct TrieNode* pCrawl = root;
    for (int i = 0; i < strlen(key); i++) {
        int index = key[i] - 'a'; // Assuming lowercase 'a'-'z'
        if (!pCrawl->children[index])
            pCrawl->children[index] = createTrieNode();
        pCrawl = pCrawl->children[index];
    }
    // Mark last node as leaf
    pCrawl->isEndOfWord = 1;
}

Time Complexity: O(L), where L is the length of the key. This is very efficient as it does not depend on the number of keys 'n' in the Trie, only on the length of the word being inserted.

2. Search (Search)

To search for a word or a prefix in a Trie, we traverse it character by character, similar to insertion.

Start from the root.
For each character in the query string:
- If the character's corresponding child node exists, move to that child.
- If not, the string (or prefix) is not in the Trie.
If we successfully traverse all characters, then:
- For a full word search: Check if the `isEndOfWord` flag of the last node is `true`.
- For a prefix search: The prefix exists if traversal completes.

// Search operation for Trie
int searchTrie(struct TrieNode* root, const char* key) {
    struct TrieNode* pCrawl = root;
    for (int i = 0; i < strlen(key); i++) {
        int index = key[i] - 'a';
        if (!pCrawl->children[index])
            return 0; // Not found
        pCrawl = pCrawl->children[index];
    }
    // Return true if pCrawl is not NULL and it's the end of a word
    return (pCrawl != NULL && pCrawl->isEndOfWord);
}

// Function to check if a prefix exists
int startsWith(struct TrieNode* root, const char* prefix) {
    struct TrieNode* pCrawl = root;
    for (int i = 0; i < strlen(prefix); i++) {
        int index = prefix[i] - 'a';
        if (!pCrawl->children[index])
            return 0; // Prefix not found
        pCrawl = pCrawl->children[index];
    }
    return (pCrawl != NULL); // Prefix found
}

Time Complexity: O(L), where L is the length of the key/prefix. Again, highly efficient.

3. Deletion (Delete)

Deletion in a Trie is more complex than insertion or search. A node can only be deleted if it's not a prefix of any other word and is not itself the end of another word. This usually involves recursive backtracking.

The `isEndOfWord` flag of the node corresponding to the end of the deleted word is set to `false`.
Then, we may recursively delete ancestor nodes if they are no longer part of any other word or prefix.

Time Complexity: O(L).

🚀 Advantages and Disadvantages of Tries

Advantages:

Extremely Fast Prefix Operations: Search, insert, and delete operations are O(L) (where L is key length), regardless of the number of keys. This is faster than hash tables in worst case and often faster than BSTs for string operations.
Space Efficiency for Common Prefixes: Shares common prefixes among words, potentially saving space compared to storing full strings individually.
Alphabetical Ordering: Can easily print all words in alphabetical order (similar to Inorder traversal).
Excellent for Autocomplete and Spell Checkers: Its inherent prefix-matching capability is perfect for these applications.

Disadvantages:

High Space Overhead for Sparse Data: If keys do not share many prefixes, or if the alphabet size is large (e.g., Unicode characters), a Trie can consume a lot of memory due to many empty child pointers.
Complexity: Implementation can be more complex than hash tables for basic operations.

Trie Example Visualization: Storing "CAT", "CAR", "DOG"

Trie for "CAT", "CAR", "DOG"

Root

'C'

'D'

'A'

'O'

'T'

'R'

'G'

Nodes marked with darker blue are the end of a word. 'CA' is a prefix to 'CAT' and 'CAR'.

📌 Real-Life Example (UdaanPath)

Tries are perfectly suited for several core functionalities on a platform like UdaanPath:

Autocomplete Search for Courses/Content: When a user starts typing in the search bar ("Python Pro..."), a Trie can quickly suggest "Python Programming", "Python Projects", etc., by traversing to the prefix node and then finding all words below it. This provides a fast and intuitive search experience.
Spell Checker/Typo Correction: If a user types "algorihm", a Trie containing correctly spelled words can quickly identify that "algorihm" is not a valid word. It can then suggest "algorithm" by looking for words with small edit distances or common prefixes.
Prefix-based Filtering: If UdaanPath offers a vast library of questions or code snippets, a Trie could be used to efficiently filter content based on common prefixes, like "Data Structure" questions or "Dynamic Programming" problems.
Dictionary/Glossary Implementation: For a quick lookup of definitions or terms, a Trie can store all terms, allowing for O(L) lookup speed, significantly faster than traditional database queries for this specific use case.

The efficiency of Tries for string-based operations makes them an indispensable tool in applications that heavily rely on text processing and predictive functionalities.

✅ Summary: Efficient String Storage

A Trie (Prefix Tree) is a tree-like data structure optimized for storing and retrieving strings based on their prefixes.
Each node typically represents a character, and paths from the root form prefixes or complete words.
Operations like insertion, search, and deletion are highly efficient, with a time complexity of O(L), where L is the length of the string, making them independent of the total number of strings.
Tries excel in applications requiring autocomplete, spell checking, and prefix-based searching.
While efficient in time, Tries can have a high space overhead if the dataset has few common prefixes or a large alphabet.

📤 Coming Next: Segment Tree (Range Queries)

Having explored tree structures for hierarchical data and strings, we now turn our attention to problems involving efficient querying over ranges of data. The next chapter will introduce the Segment Tree, a powerful tree data structure designed to perform range queries (like sum, min, max, etc.) and range updates on an array or list in logarithmic time.

← AVL Trees (Self-balancing BST) Segment Tree (Range Queries) →