Core Data Structures
Working with strings, lists, tuples, sets, and dictionaries
Core Data Structures in Python
Master Python's fundamental data structures with free flashcards and spaced repetition practice to solidify your understanding. This lesson covers lists, tuples, dictionaries, and setsβthe essential building blocks every Python programmer must know to write efficient, readable code.
Welcome to Python Data Structures π»
Data structures are the foundation of every Python program you'll write. Think of them as different types of containers, each designed for specific tasks. Just as you wouldn't store liquids in a paper bag, you shouldn't use a list when a dictionary is more appropriate. Understanding which data structure to useβand whenβis what separates beginner code from professional, performant applications.
Python provides four core built-in data structures that handle 90% of your programming needs: lists (ordered, mutable sequences), tuples (ordered, immutable sequences), dictionaries (key-value mappings), and sets (unordered collections of unique elements). Each has unique characteristics, performance profiles, and use cases.
π€ Did you know? Python's dictionary implementation uses hash tables under the hood, giving you O(1) average-case lookup time. This makes dictionaries one of the fastest data structures for retrieving values by key!
Core Concepts
Lists: The Workhorse Sequence π
Lists are ordered, mutable sequencesβthe most versatile data structure in Python. They can hold elements of different types, grow and shrink dynamically, and support indexing, slicing, and iteration.
# Creating lists
fruits = ['apple', 'banana', 'cherry']
numbers = [1, 2, 3, 4, 5]
mixed = [1, 'hello', 3.14, True]
empty_list = []
# Common operations
fruits.append('orange') # Add to end
fruits.insert(1, 'grape') # Insert at index
fruits.remove('banana') # Remove first occurrence
popped = fruits.pop() # Remove and return last item
first = fruits[0] # Access by index
slice = fruits[1:3] # Slicing [start:stop]
List comprehensions provide a concise way to create lists:
# Traditional approach
squares = []
for x in range(10):
squares.append(x**2)
# List comprehension (Pythonic!)
squares = [x**2 for x in range(10)]
# With condition
evens = [x for x in range(20) if x % 2 == 0]
π‘ Performance tip: Lists are implemented as dynamic arrays. Appending to the end is O(1) amortized, but inserting at the beginning is O(n) because all elements must shift.
Tuples: Immutable Sequences π
Tuples are like lists but immutableβonce created, they cannot be modified. This immutability makes them hashable (usable as dictionary keys) and slightly faster than lists.
# Creating tuples
coordinates = (3, 5)
colors = ('red', 'green', 'blue')
single = (42,) # Trailing comma for single element
empty_tuple = ()
# Tuple unpacking
x, y = coordinates
first, *middle, last = (1, 2, 3, 4, 5) # first=1, middle=[2,3,4], last=5
# Tuples as dictionary keys
locations = {
(0, 0): 'origin',
(1, 2): 'point A'
}
π§ Memory trick: Think "Tuple = Tight" (can't change) vs "List = Loose" (flexible).
π Real-world analogy: A tuple is like a sealed envelopeβyou can read the contents but can't modify them. A list is like a notebook where you can add, remove, or change pages.
Dictionaries: Key-Value Mappings ποΈ
Dictionaries store data as key-value pairs, providing fast lookups by key. Keys must be immutable (strings, numbers, tuples), but values can be anything.
# Creating dictionaries
student = {
'name': 'Alice',
'age': 22,
'courses': ['Math', 'CS']
}
# Accessing and modifying
name = student['name'] # Direct access (raises KeyError if missing)
age = student.get('age', 0) # Safe access with default
student['grade'] = 'A' # Add new key-value
student['age'] = 23 # Update existing
# Dictionary methods
keys = student.keys() # dict_keys(['name', 'age', 'courses', 'grade'])
values = student.values() # dict_values([...])
items = student.items() # dict_items([(key, value), ...])
# Dictionary comprehension
squares_dict = {x: x**2 for x in range(5)} # {0: 0, 1: 1, 2: 4, 3: 9, 4: 16}
Common patterns:
# Counting occurrences
from collections import defaultdict
word_count = defaultdict(int)
for word in ['apple', 'banana', 'apple']:
word_count[word] += 1
# Grouping data
from collections import defaultdict
students_by_grade = defaultdict(list)
for student in students:
students_by_grade[student['grade']].append(student)
π‘ Performance insight: Dictionary lookups are O(1) average case, making them ideal for caching, counting, and grouping operations.
Sets: Unique Collections π―
Sets are unordered collections of unique elements. They excel at membership testing, removing duplicates, and mathematical set operations.
# Creating sets
fruits = {'apple', 'banana', 'cherry'}
numbers = set([1, 2, 2, 3, 3, 3]) # {1, 2, 3} - duplicates removed
empty_set = set() # Note: {} creates empty dict!
# Set operations
fruits.add('orange') # Add element
fruits.remove('banana') # Remove (raises KeyError if missing)
fruits.discard('grape') # Remove if exists (no error)
# Mathematical operations
a = {1, 2, 3, 4}
b = {3, 4, 5, 6}
union = a | b # {1, 2, 3, 4, 5, 6}
intersection = a & b # {3, 4}
difference = a - b # {1, 2}
symmetric_diff = a ^ b # {1, 2, 5, 6}
π§ Try this: Need to remove duplicates from a list? Convert to set and back:
numbers = [1, 2, 2, 3, 3, 3, 4]
unique_numbers = list(set(numbers))
β οΈ Warning: Sets are unordered! Don't rely on element orderβit's not guaranteed.
| Data Structure | Ordered | Mutable | Duplicates | Indexed | Use When... |
|---|---|---|---|---|---|
| List | β | β | β | β | Need ordered, changeable sequence |
| Tuple | β | β | β | β | Need immutable sequence or dict key |
| Dictionary | β (3.7+) | β | Keys: β Values: β | By key | Need key-value lookups |
| Set | β | β | β | β | Need unique elements or set operations |
Detailed Examples
Example 1: Building a Contact Manager π
Let's build a simple contact manager using dictionaries and lists:
# Contact manager using nested data structures
contacts = {
'alice': {
'phone': '555-1234',
'email': 'alice@example.com',
'tags': ['friend', 'work']
},
'bob': {
'phone': '555-5678',
'email': 'bob@example.com',
'tags': ['friend']
}
}
# Add a new contact
contacts['charlie'] = {
'phone': '555-9999',
'email': 'charlie@example.com',
'tags': ['family']
}
# Find all contacts with 'friend' tag
friends = [name for name, info in contacts.items()
if 'friend' in info['tags']]
print(friends) # ['alice', 'bob']
# Update contact information
contacts['alice']['phone'] = '555-0000'
# Get contact safely
charlie_email = contacts.get('charlie', {}).get('email', 'No email')
Why these structures?
- Dictionary for contacts: Fast lookup by name (O(1))
- Nested dictionary for contact info: Organized key-value pairs
- List for tags: Ordered, can have duplicates if needed
Example 2: Analyzing Text with Collections π
Combining multiple data structures for text analysis:
text = "the quick brown fox jumps over the lazy dog the fox"
# Count word frequencies using dictionary
word_count = {}
for word in text.split():
word_count[word] = word_count.get(word, 0) + 1
print(word_count) # {'the': 3, 'quick': 1, 'brown': 1, ...}
# Find unique words using set
unique_words = set(text.split())
print(f"Unique words: {len(unique_words)}") # 8
# Get top 3 most common words
top_words = sorted(word_count.items(), key=lambda x: x[1], reverse=True)[:3]
print(top_words) # [('the', 3), ('fox', 2), ('quick', 1)]
# Store results as list of tuples (immutable records)
word_stats = [(word, count) for word, count in word_count.items()]
Data structure choices:
- Dictionary for counting: O(1) increments
- Set for uniqueness: Automatic duplicate removal
- List of tuples for results: Immutable records
Example 3: Set Operations for Data Filtering π
Using sets for efficient filtering and comparison:
# Users who completed different courses
python_users = {'alice', 'bob', 'charlie', 'david'}
javascript_users = {'bob', 'charlie', 'eve', 'frank'}
sql_users = {'alice', 'eve', 'george'}
# Users who know both Python and JavaScript
fullstack = python_users & javascript_users
print(fullstack) # {'bob', 'charlie'}
# Users who know Python or JavaScript but not both
exclusive = python_users ^ javascript_users
print(exclusive) # {'alice', 'david', 'eve', 'frank'}
# Users who know at least one language
all_users = python_users | javascript_users | sql_users
print(all_users) # {'alice', 'bob', 'charlie', 'david', 'eve', 'frank', 'george'}
# Users who only know Python
python_only = python_users - javascript_users - sql_users
print(python_only) # {'david'}
# Check if someone completed Python
if 'alice' in python_users: # O(1) lookup!
print("Alice knows Python")
Why sets? Membership testing (in) is O(1) for sets vs O(n) for lists. Set operations are both fast and readable.
Example 4: List Comprehensions vs Traditional Loops π
Comparing approaches for data transformation:
# Traditional approach
result = []
for i in range(10):
if i % 2 == 0:
result.append(i ** 2)
# List comprehension (same result, more Pythonic)
result = [i**2 for i in range(10) if i % 2 == 0]
print(result) # [0, 4, 16, 36, 64]
# Nested comprehension for matrix
matrix = [[i*j for j in range(5)] for i in range(5)]
# [[0, 0, 0, 0, 0],
# [0, 1, 2, 3, 4],
# [0, 2, 4, 6, 8], ...]
# Dictionary comprehension from two lists
keys = ['a', 'b', 'c']
values = [1, 2, 3]
my_dict = {k: v for k, v in zip(keys, values)}
print(my_dict) # {'a': 1, 'b': 2, 'c': 3}
# Set comprehension for unique squares
squares = {x**2 for x in [1, -1, 2, -2, 3]}
print(squares) # {1, 4, 9} - duplicates removed!
Performance note: Comprehensions are generally faster than equivalent loops because they're optimized at the bytecode level.
CHOOSING THE RIGHT DATA STRUCTURE
Start
β
β
βββββββββββββββ
β Need unique β YES
β elements? ββββββββββ Use SET
ββββββββ¬βββββββ {1, 2, 3}
β NO
β
βββββββββββββββ
β Need fast β YES
β lookup by ββββββββββ Use DICTIONARY
β key? β {'key': 'value'}
ββββββββ¬βββββββ
β NO
β
βββββββββββββββ
β Need to β YES
β modify ββββββββββ Use LIST
β elements? β [1, 2, 3]
ββββββββ¬βββββββ
β NO
β
Use TUPLE
(1, 2, 3)
Common Mistakes β οΈ
Mistake 1: Using Lists as Dictionary Keys
# β WRONG - Lists are mutable and not hashable
try:
my_dict = {[1, 2]: 'value'}
except TypeError as e:
print(e) # unhashable type: 'list'
# β
RIGHT - Use tuples instead
my_dict = {(1, 2): 'value'}
Why? Dictionary keys must be immutable. Use tuples, strings, or numbersβnever lists or sets.
Mistake 2: Modifying a List While Iterating
numbers = [1, 2, 3, 4, 5]
# β WRONG - Causes skipped elements
for num in numbers:
if num % 2 == 0:
numbers.remove(num) # Modifies list during iteration!
print(numbers) # [1, 3, 5] - but 4 was skipped!
# β
RIGHT - Iterate over a copy
numbers = [1, 2, 3, 4, 5]
for num in numbers[:]:
if num % 2 == 0:
numbers.remove(num)
# β
BETTER - Use list comprehension
numbers = [num for num in numbers if num % 2 != 0]
Mistake 3: Creating Empty Sets Incorrectly
# β WRONG - This creates an empty DICTIONARY!
empty = {}
print(type(empty)) # <class 'dict'>
# β
RIGHT - Use set() constructor
empty_set = set()
print(type(empty_set)) # <class 'set'>
Mistake 4: Assuming Dictionary Order (Python < 3.7)
# In Python 3.7+, dictionaries maintain insertion order
my_dict = {'z': 1, 'a': 2, 'b': 3}
print(list(my_dict.keys())) # ['z', 'a', 'b'] in 3.7+
# But if you need guaranteed order in older versions:
from collections import OrderedDict
ordered = OrderedDict([('z', 1), ('a', 2), ('b', 3)])
Mistake 5: Not Using .get() for Safe Dictionary Access
user = {'name': 'Alice', 'age': 25}
# β RISKY - Raises KeyError if key doesn't exist
email = user['email'] # KeyError!
# β
SAFE - Returns None (or default) if key doesn't exist
email = user.get('email')
email = user.get('email', 'no-email@example.com') # With default
Mistake 6: Forgetting That Tuples with One Element Need a Comma
# β WRONG - This is just an integer in parentheses
not_a_tuple = (42)
print(type(not_a_tuple)) # <class 'int'>
# β
RIGHT - Trailing comma makes it a tuple
actual_tuple = (42,)
print(type(actual_tuple)) # <class 'tuple'>
Key Takeaways π―
Lists are your go-to for ordered, mutable sequences. Use them when you need to maintain order and modify elements.
Tuples are immutable listsβperfect for data that shouldn't change, function returns, and dictionary keys.
Dictionaries provide O(1) lookups by key, making them ideal for mappings, caching, and grouping data.
Sets automatically handle uniqueness and provide fast membership testing plus mathematical operations.
Comprehensions (list, dict, set) are more Pythonic and often faster than traditional loops.
Choose data structures based on your needs: mutability, order, uniqueness, and lookup patterns.
Always use
.get()for safe dictionary access and avoid modifying collections while iterating over them.Remember that
{}creates an empty dictionary, not a setβuseset()for empty sets.
π Quick Reference Card
| Structure | Syntax | Key Methods | Time Complexity |
|---|---|---|---|
| List | [1, 2, 3] |
append(), insert(), remove(), pop(), sort() | Access: O(1), Search: O(n) |
| Tuple | (1, 2, 3) |
count(), index() | Access: O(1), Search: O(n) |
| Dictionary | {'a': 1} |
get(), keys(), values(), items(), pop() | Lookup: O(1), Insert: O(1) |
| Set | {1, 2, 3} |
add(), remove(), union(), intersection() | Lookup: O(1), Add: O(1) |
| Operation | Code Example |
|---|---|
| List comprehension | [x**2 for x in range(5) if x % 2 == 0] |
| Dict comprehension | {k: v**2 for k, v in my_dict.items()} |
| Set operations | a | b (union), a & b (intersection) |
| Safe dict access | my_dict.get('key', default_value) |
| Tuple unpacking | x, y, z = (1, 2, 3) |
π Further Study
- Python Official Documentation - Data Structures: https://docs.python.org/3/tutorial/datastructures.html
- Real Python - Python Data Structures: https://realpython.com/python-data-structures/
- Python Collections Module Documentation: https://docs.python.org/3/library/collections.html