1. What are Sets? The Fundamentals
A set is a mutable, unordered collection of unique and immutable items. Key characteristics:
-
Unique Elements: Sets automatically discard duplicate entries. If you add an element that already exists, the set remains unchanged.
-
Unordered: Items in a set do not have a defined order, and you cannot access them by index. Their order may change.
-
Mutable: You can add and remove elements from a set after it's created.
-
Elements Must Be Immutable (Hashable): Only immutable data types (like numbers, strings, tuples) can be elements of a set. Mutable types (like lists or dictionaries) cannot be set elements because they are not "hashable" (their value can change).
Syntax: Sets are created using curly braces `{}`, with items separated by commas. However, to create an empty set, you must use `set()`, as `{}` creates an empty dictionary.
# Creating sets
my_set = {1, 2, 3, 4, 1, 2} # Duplicates are automatically removed
print(f"My set (duplicates removed): {my_set}")
# Creating a set from a list (useful for uniqueness)
my_list = [10, 20, 10, 30, 40, 20]
unique_numbers = set(my_list)
print(f"Unique numbers from list: {unique_numbers}")
# Creating an empty set (IMPORTANT: Use set()!)
empty_set = set()
print(f"Empty set: {empty_set}, Type: {type(empty_set)}")
# What happens if you try to add a mutable element?
try:
bad_set = {1, [2, 3]} # This will raise a TypeError
except TypeError as e:
print(f"Error trying to add mutable element: {e}")
My set (duplicates removed): {1, 2, 3, 4}
Unique numbers from list: {40, 10, 20, 30}
Empty set: set(), Type: <class 'set'>
Error trying to add mutable element: unhashable type: 'list'
2. Adding and Removing Elements
Sets are mutable, so you can change their contents.
-
.add(item)
: Adds a single element to the set. If the element already exists, the set remains unchanged.
-
.update(iterable)
: Adds multiple elements from any iterable (list, tuple, string, another set) to the set.
-
.remove(item)
: Removes a specific element. If the element is not found, it raises a KeyError
.
-
.discard(item)
: Removes a specific element. If the element is not found, it does nothing and does NOT raise an error (safer for unknown presence).
-
.pop()
: Removes and returns an arbitrary element from the set. Since sets are unordered, you cannot predict which element will be removed. Raises `KeyError` if the set is empty.
-
.clear()
: Removes all elements from the set, making it empty.
skills = {"Python", "SQL", "Git"}
print(f"Initial skills: {skills}")
skills.add("JavaScript") # Add a new skill
skills.add("Python") # Attempt to add existing skill (no change)
print(f"After adding: {skills}")
skills.update(["Django", "Flask", "SQL"]) # Add multiple from a list
print(f"After update: {skills}")
skills.remove("SQL") # Remove an existing skill
print(f"After remove 'SQL': {skills}")
try:
skills.remove("Java") # Attempt to remove non-existent skill
except KeyError as e:
print(f"Error removing non-existent skill: {e}")
skills.discard("Java") # Using discard - no error
print(f"After discard 'Java' (no change, no error): {skills}")
removed_skill = skills.pop() # Remove an arbitrary skill
print(f"After pop: {skills}, Removed: {removed_skill}")
my_tech_stack = {"Frontend", "Backend"}
my_tech_stack.clear()
print(f"After clear: {my_tech_stack}")
Initial skills: {'SQL', 'Python', 'Git'}
After adding: {'SQL', 'Python', 'Git', 'JavaScript'}
After update: {'SQL', 'Flask', 'Python', 'Git', 'JavaScript', 'Django'}
After remove 'SQL': {'Flask', 'Python', 'Git', 'JavaScript', 'Django'}
Error removing non-existent skill: 'Java'
After discard 'Java' (no change, no error): {'Flask', 'Python', 'Git', 'JavaScript', 'Django'}
After pop: {'Python', 'Git', 'JavaScript', 'Django'}, Removed: Flask
After clear: set()
Interview Tip: Differentiate between `.remove()` and `.discard()`. `discard()` is generally safer if you're not sure an element exists.
3. Set Operations: Powerful Data Comparisons
Sets shine when performing operations based on mathematical set theory. These are incredibly useful for comparing collections of unique items.
UdaanPath Context: Imagine managing user roles, permissions, or common interests. Set operations are perfect for these tasks.
Setup:
# UdaanPath course enrollments
students_python = {"Alice", "Bob", "Charlie", "David"}
students_web_dev = {"Charlie", "Eve", "Frank", "Bob"}
students_data_science = {"David", "Grace", "Eve"}
print(f"Python Students: {students_python}")
print(f"Web Dev Students: {students_web_dev}")
print(f"Data Science Students: {students_data_science}")
Python Students: {'David', 'Charlie', 'Bob', 'Alice'}
Web Dev Students: {'Frank', 'Charlie', 'Bob', 'Eve'}
Data Science Students: {'Grace', 'David', 'Eve'}
Union (`|` or `.union()`): All unique elements from both sets.
Combines all unique elements from two or more sets.
all_enrolled_students = students_python.union(students_web_dev)
# or using operator: all_enrolled_students = students_python | students_web_dev
print(f"Students in Python OR Web Dev: {all_enrolled_students}")
Students in Python OR Web Dev: {'Frank', 'David', 'Charlie', 'Bob', 'Alice', 'Eve'}
Intersection (`&` or `.intersection()`): Common elements in both sets.
Finds elements that are present in both sets.
common_students_py_web = students_python.intersection(students_web_dev)
# or using operator: common_students_py_web = students_python & students_web_dev
print(f"Students in BOTH Python AND Web Dev: {common_students_py_web}")
Students in BOTH Python AND Web Dev: {'Charlie', 'Bob'}
Difference (`-` or `.difference()`): Elements in the first set but not the second.
Finds elements unique to the first set compared to the second.
only_python_students = students_python.difference(students_web_dev)
# or using operator: only_python_students = students_python - students_web_dev
print(f"Students only in Python (not Web Dev): {only_python_students}")
only_web_dev_students = students_web_dev - students_python
print(f"Students only in Web Dev (not Python): {only_web_dev_students}")
Students only in Python (not Web Dev): {'David', 'Alice'}
Students only in Web Dev (not Python): {'Frank', 'Eve'}
Symmetric Difference (`^` or `.symmetric_difference()`): Elements unique to each set.
Finds elements that are in either set, but not in both (the opposite of intersection).
unique_to_either = students_python.symmetric_difference(students_web_dev)
# or using operator: unique_to_either = students_python ^ students_web_dev
print(f"Students unique to either Python or Web Dev: {unique_to_either}")
Students unique to either Python or Web Dev: {'Frank', 'David', 'Alice', 'Eve'}
Subset (`<=` or `.issubset()`): Check if one set's elements are all in another.
Returns `True` if all elements of the first set are present in the second set.
small_group = {"Alice", "David"}
print(f"Is small_group a subset of Python students? {small_group.issubset(students_python)}")
# or using operator: small_group <= students_python
print(f"Is students_data_science a subset of Python students? {students_data_science.issubset(students_python)}")
Is small_group a subset of Python students? True
Is students_data_science a subset of Python students? False
Superset (`>=` or `.issuperset()`): Check if one set contains all elements of another.
Returns `True` if the first set contains all elements of the second set.
print(f"Is Python students a superset of small_group? {students_python.issuperset(small_group)}")
# or using operator: students_python >= small_group
Is Python students a superset of small_group? True
Disjoint (`.isdisjoint()`): Check if two sets have no common elements.
Returns `True` if the intersection of the two sets is empty.
exclusive_group_A = {"Xavier", "Yara"}
exclusive_group_B = {"Zoe", "Walter"}
print(f"Are Python students and exclusive_group_A disjoint? {students_python.isdisjoint(exclusive_group_A)}")
print(f"Are exclusive_group_A and exclusive_group_B disjoint? {exclusive_group_A.isdisjoint(exclusive_group_B)}")
Are Python students and exclusive_group_A disjoint? True
Are exclusive_group_A and exclusive_group_B disjoint? True