You are viewing a preview of this lesson. Sign in to start learning
Back to Python Programming

Core Data Structures

Working with strings, lists, tuples, sets, and dictionaries

Core Data Structures in Python

Master Python's fundamental data structures with free flashcards and spaced repetition practice to solidify your understanding. This lesson covers lists, tuples, dictionaries, and setsβ€”the essential building blocks every Python programmer must know to write efficient, readable code.

Welcome to Python Data Structures πŸ’»

Data structures are the foundation of every Python program you'll write. Think of them as different types of containers, each designed for specific tasks. Just as you wouldn't store liquids in a paper bag, you shouldn't use a list when a dictionary is more appropriate. Understanding which data structure to useβ€”and whenβ€”is what separates beginner code from professional, performant applications.

Python provides four core built-in data structures that handle 90% of your programming needs: lists (ordered, mutable sequences), tuples (ordered, immutable sequences), dictionaries (key-value mappings), and sets (unordered collections of unique elements). Each has unique characteristics, performance profiles, and use cases.

πŸ€” Did you know? Python's dictionary implementation uses hash tables under the hood, giving you O(1) average-case lookup time. This makes dictionaries one of the fastest data structures for retrieving values by key!

Core Concepts

Lists: The Workhorse Sequence πŸ“‹

Lists are ordered, mutable sequencesβ€”the most versatile data structure in Python. They can hold elements of different types, grow and shrink dynamically, and support indexing, slicing, and iteration.

# Creating lists
fruits = ['apple', 'banana', 'cherry']
numbers = [1, 2, 3, 4, 5]
mixed = [1, 'hello', 3.14, True]
empty_list = []

# Common operations
fruits.append('orange')          # Add to end
fruits.insert(1, 'grape')        # Insert at index
fruits.remove('banana')          # Remove first occurrence
popped = fruits.pop()            # Remove and return last item
first = fruits[0]                # Access by index
slice = fruits[1:3]              # Slicing [start:stop]

List comprehensions provide a concise way to create lists:

# Traditional approach
squares = []
for x in range(10):
    squares.append(x**2)

# List comprehension (Pythonic!)
squares = [x**2 for x in range(10)]

# With condition
evens = [x for x in range(20) if x % 2 == 0]

πŸ’‘ Performance tip: Lists are implemented as dynamic arrays. Appending to the end is O(1) amortized, but inserting at the beginning is O(n) because all elements must shift.

Tuples: Immutable Sequences πŸ”’

Tuples are like lists but immutableβ€”once created, they cannot be modified. This immutability makes them hashable (usable as dictionary keys) and slightly faster than lists.

# Creating tuples
coordinates = (3, 5)
colors = ('red', 'green', 'blue')
single = (42,)                   # Trailing comma for single element
empty_tuple = ()

# Tuple unpacking
x, y = coordinates
first, *middle, last = (1, 2, 3, 4, 5)  # first=1, middle=[2,3,4], last=5

# Tuples as dictionary keys
locations = {
    (0, 0): 'origin',
    (1, 2): 'point A'
}

🧠 Memory trick: Think "Tuple = Tight" (can't change) vs "List = Loose" (flexible).

🌍 Real-world analogy: A tuple is like a sealed envelopeβ€”you can read the contents but can't modify them. A list is like a notebook where you can add, remove, or change pages.

Dictionaries: Key-Value Mappings πŸ—οΈ

Dictionaries store data as key-value pairs, providing fast lookups by key. Keys must be immutable (strings, numbers, tuples), but values can be anything.

# Creating dictionaries
student = {
    'name': 'Alice',
    'age': 22,
    'courses': ['Math', 'CS']
}

# Accessing and modifying
name = student['name']                    # Direct access (raises KeyError if missing)
age = student.get('age', 0)               # Safe access with default
student['grade'] = 'A'                    # Add new key-value
student['age'] = 23                       # Update existing

# Dictionary methods
keys = student.keys()                     # dict_keys(['name', 'age', 'courses', 'grade'])
values = student.values()                 # dict_values([...])
items = student.items()                   # dict_items([(key, value), ...])

# Dictionary comprehension
squares_dict = {x: x**2 for x in range(5)}  # {0: 0, 1: 1, 2: 4, 3: 9, 4: 16}

Common patterns:

# Counting occurrences
from collections import defaultdict
word_count = defaultdict(int)
for word in ['apple', 'banana', 'apple']:
    word_count[word] += 1

# Grouping data
from collections import defaultdict
students_by_grade = defaultdict(list)
for student in students:
    students_by_grade[student['grade']].append(student)

πŸ’‘ Performance insight: Dictionary lookups are O(1) average case, making them ideal for caching, counting, and grouping operations.

Sets: Unique Collections 🎯

Sets are unordered collections of unique elements. They excel at membership testing, removing duplicates, and mathematical set operations.

# Creating sets
fruits = {'apple', 'banana', 'cherry'}
numbers = set([1, 2, 2, 3, 3, 3])        # {1, 2, 3} - duplicates removed
empty_set = set()                         # Note: {} creates empty dict!

# Set operations
fruits.add('orange')                      # Add element
fruits.remove('banana')                   # Remove (raises KeyError if missing)
fruits.discard('grape')                   # Remove if exists (no error)

# Mathematical operations
a = {1, 2, 3, 4}
b = {3, 4, 5, 6}

union = a | b                             # {1, 2, 3, 4, 5, 6}
intersection = a & b                      # {3, 4}
difference = a - b                        # {1, 2}
symmetric_diff = a ^ b                    # {1, 2, 5, 6}

πŸ”§ Try this: Need to remove duplicates from a list? Convert to set and back:

numbers = [1, 2, 2, 3, 3, 3, 4]
unique_numbers = list(set(numbers))

⚠️ Warning: Sets are unordered! Don't rely on element orderβ€”it's not guaranteed.

Data StructureOrderedMutableDuplicatesIndexedUse When...
Listβœ…βœ…βœ…βœ…Need ordered, changeable sequence
Tupleβœ…βŒβœ…βœ…Need immutable sequence or dict key
Dictionaryβœ… (3.7+)βœ…Keys: ❌
Values: βœ…
By keyNeed key-value lookups
SetβŒβœ…βŒβŒNeed unique elements or set operations

Detailed Examples

Example 1: Building a Contact Manager πŸ“‡

Let's build a simple contact manager using dictionaries and lists:

# Contact manager using nested data structures
contacts = {
    'alice': {
        'phone': '555-1234',
        'email': 'alice@example.com',
        'tags': ['friend', 'work']
    },
    'bob': {
        'phone': '555-5678',
        'email': 'bob@example.com',
        'tags': ['friend']
    }
}

# Add a new contact
contacts['charlie'] = {
    'phone': '555-9999',
    'email': 'charlie@example.com',
    'tags': ['family']
}

# Find all contacts with 'friend' tag
friends = [name for name, info in contacts.items() 
           if 'friend' in info['tags']]
print(friends)  # ['alice', 'bob']

# Update contact information
contacts['alice']['phone'] = '555-0000'

# Get contact safely
charlie_email = contacts.get('charlie', {}).get('email', 'No email')

Why these structures?

  • Dictionary for contacts: Fast lookup by name (O(1))
  • Nested dictionary for contact info: Organized key-value pairs
  • List for tags: Ordered, can have duplicates if needed

Example 2: Analyzing Text with Collections πŸ“Š

Combining multiple data structures for text analysis:

text = "the quick brown fox jumps over the lazy dog the fox"

# Count word frequencies using dictionary
word_count = {}
for word in text.split():
    word_count[word] = word_count.get(word, 0) + 1

print(word_count)  # {'the': 3, 'quick': 1, 'brown': 1, ...}

# Find unique words using set
unique_words = set(text.split())
print(f"Unique words: {len(unique_words)}")  # 8

# Get top 3 most common words
top_words = sorted(word_count.items(), key=lambda x: x[1], reverse=True)[:3]
print(top_words)  # [('the', 3), ('fox', 2), ('quick', 1)]

# Store results as list of tuples (immutable records)
word_stats = [(word, count) for word, count in word_count.items()]

Data structure choices:

  • Dictionary for counting: O(1) increments
  • Set for uniqueness: Automatic duplicate removal
  • List of tuples for results: Immutable records

Example 3: Set Operations for Data Filtering πŸ”

Using sets for efficient filtering and comparison:

# Users who completed different courses
python_users = {'alice', 'bob', 'charlie', 'david'}
javascript_users = {'bob', 'charlie', 'eve', 'frank'}
sql_users = {'alice', 'eve', 'george'}

# Users who know both Python and JavaScript
fullstack = python_users & javascript_users
print(fullstack)  # {'bob', 'charlie'}

# Users who know Python or JavaScript but not both
exclusive = python_users ^ javascript_users
print(exclusive)  # {'alice', 'david', 'eve', 'frank'}

# Users who know at least one language
all_users = python_users | javascript_users | sql_users
print(all_users)  # {'alice', 'bob', 'charlie', 'david', 'eve', 'frank', 'george'}

# Users who only know Python
python_only = python_users - javascript_users - sql_users
print(python_only)  # {'david'}

# Check if someone completed Python
if 'alice' in python_users:  # O(1) lookup!
    print("Alice knows Python")

Why sets? Membership testing (in) is O(1) for sets vs O(n) for lists. Set operations are both fast and readable.

Example 4: List Comprehensions vs Traditional Loops πŸ”„

Comparing approaches for data transformation:

# Traditional approach
result = []
for i in range(10):
    if i % 2 == 0:
        result.append(i ** 2)

# List comprehension (same result, more Pythonic)
result = [i**2 for i in range(10) if i % 2 == 0]
print(result)  # [0, 4, 16, 36, 64]

# Nested comprehension for matrix
matrix = [[i*j for j in range(5)] for i in range(5)]
# [[0, 0, 0, 0, 0],
#  [0, 1, 2, 3, 4],
#  [0, 2, 4, 6, 8], ...]

# Dictionary comprehension from two lists
keys = ['a', 'b', 'c']
values = [1, 2, 3]
my_dict = {k: v for k, v in zip(keys, values)}
print(my_dict)  # {'a': 1, 'b': 2, 'c': 3}

# Set comprehension for unique squares
squares = {x**2 for x in [1, -1, 2, -2, 3]}
print(squares)  # {1, 4, 9} - duplicates removed!

Performance note: Comprehensions are generally faster than equivalent loops because they're optimized at the bytecode level.

CHOOSING THE RIGHT DATA STRUCTURE

        Start
          β”‚
          ↓
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚ Need unique β”‚  YES
    β”‚  elements?  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β†’ Use SET
    β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜          {1, 2, 3}
           β”‚ NO
           ↓
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚ Need fast   β”‚  YES
    β”‚  lookup by  β”œβ”€β”€β”€β”€β”€β”€β”€β”€β†’ Use DICTIONARY
    β”‚    key?     β”‚          {'key': 'value'}
    β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
           β”‚ NO
           ↓
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚   Need to   β”‚  YES
    β”‚   modify    β”œβ”€β”€β”€β”€β”€β”€β”€β”€β†’ Use LIST
    β”‚  elements?  β”‚          [1, 2, 3]
    β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
           β”‚ NO
           ↓
      Use TUPLE
      (1, 2, 3)

Common Mistakes ⚠️

Mistake 1: Using Lists as Dictionary Keys

# ❌ WRONG - Lists are mutable and not hashable
try:
    my_dict = {[1, 2]: 'value'}
except TypeError as e:
    print(e)  # unhashable type: 'list'

# βœ… RIGHT - Use tuples instead
my_dict = {(1, 2): 'value'}

Why? Dictionary keys must be immutable. Use tuples, strings, or numbersβ€”never lists or sets.

Mistake 2: Modifying a List While Iterating

numbers = [1, 2, 3, 4, 5]

# ❌ WRONG - Causes skipped elements
for num in numbers:
    if num % 2 == 0:
        numbers.remove(num)  # Modifies list during iteration!

print(numbers)  # [1, 3, 5] - but 4 was skipped!

# βœ… RIGHT - Iterate over a copy
numbers = [1, 2, 3, 4, 5]
for num in numbers[:]:
    if num % 2 == 0:
        numbers.remove(num)

# βœ… BETTER - Use list comprehension
numbers = [num for num in numbers if num % 2 != 0]

Mistake 3: Creating Empty Sets Incorrectly

# ❌ WRONG - This creates an empty DICTIONARY!
empty = {}
print(type(empty))  # <class 'dict'>

# βœ… RIGHT - Use set() constructor
empty_set = set()
print(type(empty_set))  # <class 'set'>

Mistake 4: Assuming Dictionary Order (Python < 3.7)

# In Python 3.7+, dictionaries maintain insertion order
my_dict = {'z': 1, 'a': 2, 'b': 3}
print(list(my_dict.keys()))  # ['z', 'a', 'b'] in 3.7+

# But if you need guaranteed order in older versions:
from collections import OrderedDict
ordered = OrderedDict([('z', 1), ('a', 2), ('b', 3)])

Mistake 5: Not Using .get() for Safe Dictionary Access

user = {'name': 'Alice', 'age': 25}

# ❌ RISKY - Raises KeyError if key doesn't exist
email = user['email']  # KeyError!

# βœ… SAFE - Returns None (or default) if key doesn't exist
email = user.get('email')
email = user.get('email', 'no-email@example.com')  # With default

Mistake 6: Forgetting That Tuples with One Element Need a Comma

# ❌ WRONG - This is just an integer in parentheses
not_a_tuple = (42)
print(type(not_a_tuple))  # <class 'int'>

# βœ… RIGHT - Trailing comma makes it a tuple
actual_tuple = (42,)
print(type(actual_tuple))  # <class 'tuple'>

Key Takeaways 🎯

  1. Lists are your go-to for ordered, mutable sequences. Use them when you need to maintain order and modify elements.

  2. Tuples are immutable listsβ€”perfect for data that shouldn't change, function returns, and dictionary keys.

  3. Dictionaries provide O(1) lookups by key, making them ideal for mappings, caching, and grouping data.

  4. Sets automatically handle uniqueness and provide fast membership testing plus mathematical operations.

  5. Comprehensions (list, dict, set) are more Pythonic and often faster than traditional loops.

  6. Choose data structures based on your needs: mutability, order, uniqueness, and lookup patterns.

  7. Always use .get() for safe dictionary access and avoid modifying collections while iterating over them.

  8. Remember that {} creates an empty dictionary, not a setβ€”use set() for empty sets.

πŸ“‹ Quick Reference Card

Structure Syntax Key Methods Time Complexity
List [1, 2, 3] append(), insert(), remove(), pop(), sort() Access: O(1), Search: O(n)
Tuple (1, 2, 3) count(), index() Access: O(1), Search: O(n)
Dictionary {'a': 1} get(), keys(), values(), items(), pop() Lookup: O(1), Insert: O(1)
Set {1, 2, 3} add(), remove(), union(), intersection() Lookup: O(1), Add: O(1)
Operation Code Example
List comprehension [x**2 for x in range(5) if x % 2 == 0]
Dict comprehension {k: v**2 for k, v in my_dict.items()}
Set operations a | b (union), a & b (intersection)
Safe dict access my_dict.get('key', default_value)
Tuple unpacking x, y, z = (1, 2, 3)

πŸ“š Further Study

  1. Python Official Documentation - Data Structures: https://docs.python.org/3/tutorial/datastructures.html
  2. Real Python - Python Data Structures: https://realpython.com/python-data-structures/
  3. Python Collections Module Documentation: https://docs.python.org/3/library/collections.html

Practice Questions

Test your understanding with these questions:

Q1: Write Python code that takes a list of dictionaries representing students (each with 'name' and 'grade' keys) and returns a dictionary where keys are grades and values are lists of student names with that grade. Use appropriate data structures for efficient grouping.
A: !AI