You are viewing a preview of this course. Sign in to start learning

Lesson 3: Collections and Higher-Order Functions

Explore F# collections (lists, arrays, sequences) and master powerful higher-order functions like map, filter, fold, and their practical applications

Collections and Higher-Order Functions in F#

Now that you've mastered the fundamentals of F# functions, pattern matching, and recursion, it's time to explore one of functional programming's most powerful features: higher-order functions operating on collections. In functional programming, we rarely work with individual data items in isolation. Instead, we process collections of data using composable, reusable functions that transform entire datasets in expressive, declarative ways. F# provides three primary collection types—lists, arrays, and sequences—each with distinct characteristics and use cases. Lists are immutable, singly-linked structures perfect for recursive processing; arrays are mutable, fixed-size structures offering fast random access; and sequences are lazily-evaluated collections that generate elements on-demand, making them ideal for infinite or large datasets. Understanding when to use each collection type and how to manipulate them with higher-order functions is fundamental to writing idiomatic, efficient F# code.

Understanding F# Collections

Let's start with lists, the most commonly used collection in functional F#. Lists are created using square brackets with semicolons separating elements: [1; 2; 3; 4; 5]. You can also use the cons operator :: to prepend elements to a list, as in 1 :: [2; 3], which creates [1; 2; 3]. Lists are immutable, meaning once created, they cannot be changed—any "modification" actually creates a new list. This immutability enables structural sharing: when you prepend an element to a list, F# doesn't copy the entire list; it creates a new head node pointing to the existing tail, making operations highly efficient. The empty list is represented as []. Lists excel at sequential processing and recursive algorithms because accessing the head and tail is O(1), though accessing elements by index is O(n) since you must traverse from the beginning.

// Creating and working with lists
let numbers = [1; 2; 3; 4; 5]
let moreNumbers = 0 :: numbers  // [0; 1; 2; 3; 4; 5]
let combined = [1; 2] @ [3; 4]  // @ is list concatenation: [1; 2; 3; 4]

// Pattern matching with lists
let rec sumList lst =
    match lst with
    | [] -> 0
    | head :: tail -> head + sumList tail

let total = sumList [1; 2; 3; 4; 5]  // 15

Arrays in F# are declared using [| |] syntax: [|1; 2; 3; 4; 5|]. Unlike lists, arrays are mutable and have fixed size once created. They provide O(1) random access to elements by index, making them ideal for scenarios requiring frequent lookups or when interoperating with .NET libraries that expect arrays. You can modify array elements using the <- assignment operator: arr.[0] <- 10. While mutability contradicts pure functional principles, arrays are sometimes necessary for performance-critical code or interfacing with imperative frameworks. Sequences (type seq<'T>) are F#'s equivalent of .NET's IEnumerable<T> and represent lazy collections. Sequences don't store all elements in memory; instead, they compute elements on-demand. This makes them perfect for infinite sequences, large datasets, or computations where you might not need all elements. You create sequences using seq { } computation expressions or by converting other collections with Seq.ofList or Seq.ofArray.

// Arrays and sequences
let mutableArray = [|1; 2; 3; 4; 5|]
mutableArray.[0] <- 10  // Array is now [|10; 2; 3; 4; 5|]

// Lazy sequence - generates infinite numbers
let infiniteNumbers = seq {
    let mutable i = 0
    while true do
        yield i
        i <- i + 1
}

// Taking first 5 elements (lazy evaluation means we only compute what we need)
let firstFive = infiniteNumbers |> Seq.take 5 |> Seq.toList  // [0; 1; 2; 3; 4]

💡 Tip: Choose lists for most functional code and recursive algorithms, arrays when you need mutable, indexed access or .NET interop, and sequences for lazy evaluation, infinite collections, or when processing large datasets where you might not need all elements.

The Power of Higher-Order Functions

Higher-order functions are functions that either take other functions as parameters, return functions as results, or both. This concept is central to functional programming because it enables us to write generic, reusable code that operates on different types of data with different behaviors. Instead of writing separate loops for every data transformation, we use higher-order functions that encapsulate common patterns like mapping, filtering, and reducing. F# provides three essential module families—List, Array, and Seq—each containing identical sets of higher-order functions for their respective collection types. The most fundamental higher-order functions are map, filter, and fold, which form the building blocks for almost all collection processing in functional programming.

The map function applies a transformation function to every element in a collection, producing a new collection of the same length with transformed elements. The signature is ('T -> 'U) -> 'T list -> 'U list, meaning it takes a function from type 'T to type 'U, a list of 'T, and produces a list of 'U. For example, List.map (fun x -> x * 2) [1; 2; 3] produces [2; 4; 6]. The beauty of map is that it completely separates the iteration logic (going through each element) from the transformation logic (what to do with each element), making code more modular and testable. You can compose multiple map operations, though F# optimizes these chains to avoid creating intermediate collections when possible.

// Map examples - transforming every element
let numbers = [1; 2; 3; 4; 5]

// Double every number
let doubled = List.map (fun x -> x * 2) numbers  // [2; 4; 6; 8; 10]

// Convert to strings
let strings = List.map (fun x -> sprintf "Number: %d" x) numbers
// ["Number: 1"; "Number: 2"; "Number: 3"; "Number: 4"; "Number: 5"]

// Map with a named function
let square x = x * x
let squared = List.map square numbers  // [1; 4; 9; 16; 25]

// Chaining maps using the pipe operator
let result = 
    numbers
    |> List.map (fun x -> x * 2)
    |> List.map (fun x -> x + 1)
    |> List.map (fun x -> x * x)
// Result: [9; 25; 49; 81; 121]

⚠️ Common Mistake: Don't confuse map with iter. List.map transforms elements and returns a new list, while List.iter performs side effects (like printing) and returns unit (). Use map for transformations, iter for actions.

Filtering and Selecting Data

The filter function creates a new collection containing only elements that satisfy a given predicate (a function returning bool). Its signature is ('T -> bool) -> 'T list -> 'T list. For example, List.filter (fun x -> x % 2 = 0) [1; 2; 3; 4; 5] returns [2; 4], keeping only even numbers. Unlike map, filter may produce a collection shorter than the original—or even empty if no elements satisfy the predicate. This makes filter perfect for extracting subsets of data based on conditions. You can combine filter and map to create powerful data processing pipelines: first filter to select relevant items, then map to transform them.

Related functions include choose, partition, and where. The List.choose function combines filtering and mapping: it applies an option-returning function to each element and keeps only the Some values, unwrapping them automatically. This is incredibly useful when a transformation might fail for some elements. For example, List.choose (fun x -> if x > 0 then Some (1.0 / float x) else None) [-1; 0; 2; 4] produces [0.5; 0.25], computing reciprocals only for positive numbers. The List.partition function splits a list into two lists based on a predicate: one containing elements that satisfy it, the other containing elements that don't. This is more efficient than calling filter twice with opposite predicates.

// Filter examples - selecting elements
let numbers = [1; 2; 3; 4; 5; 6; 7; 8; 9; 10]

// Get only even numbers
let evens = List.filter (fun x -> x % 2 = 0) numbers  // [2; 4; 6; 8; 10]

// Get numbers greater than 5
let large = List.filter (fun x -> x > 5) numbers  // [6; 7; 8; 9; 10]

// Combining filter and map
let evenSquares = 
    numbers
    |> List.filter (fun x -> x % 2 = 0)
    |> List.map (fun x -> x * x)
// [4; 16; 36; 64; 100]

// Using choose - parse strings to ints, ignoring invalid ones
let strings = ["1"; "hello"; "42"; "world"; "7"]
let validNumbers = 
    strings
    |> List.choose (fun s -> 
        match System.Int32.TryParse(s) with
        | (true, n) -> Some n
        | (false, _) -> None)
// [1; 42; 7]

// Using partition
let evens, odds = List.partition (fun x -> x % 2 = 0) numbers
// evens = [2; 4; 6; 8; 10], odds = [1; 3; 5; 7; 9]

Folding: The Universal Collection Processor

The fold function (also called reduce or aggregate in other languages) is the most powerful and general higher-order function. It accumulates a result by applying a function repeatedly to each element along with an accumulator value. The signature is ('State -> 'T -> 'State) -> 'State -> 'T list -> 'State. You provide an initial accumulator value, a function that takes the current accumulator and an element to produce the next accumulator, and the list to process. The function threads the accumulator through each element, returning the final accumulated value. For example, List.fold (fun acc x -> acc + x) 0 [1; 2; 3; 4; 5] computes 0 + 1 = 1, then 1 + 2 = 3, then 3 + 3 = 6, then 6 + 4 = 10, then 10 + 5 = 15, returning 15.

What makes fold so powerful is that both map and filter can be implemented using fold—it's truly the universal collection processor. You can use fold to compute sums, products, counts, maximum values, build strings, construct new data structures, and much more. F# provides List.fold (left fold, processes from left to right) and List.foldBack (right fold, processes from right to left). For associative operations like addition, the direction doesn't matter, but for operations where order matters (like list building or division), choosing the right fold is crucial. The reduce function is similar to fold but uses the first element as the initial accumulator, which is convenient but requires a non-empty list.

// Fold examples - accumulating results
let numbers = [1; 2; 3; 4; 5]

// Sum all numbers
let sum = List.fold (fun acc x -> acc + x) 0 numbers  // 15

// Product of all numbers
let product = List.fold (fun acc x -> acc * x) 1 numbers  // 120

// Count elements (yes, fold can do this!)
let count = List.fold (fun acc _ -> acc + 1) 0 numbers  // 5

// Find maximum
let max = List.fold (fun acc x -> if x > acc then x else acc) System.Int32.MinValue numbers  // 5

// Build a string with fold
let sentence = List.fold (fun acc word -> acc + " " + word) "" ["Hello"; "functional"; "world"]
// " Hello functional world"

// Implementing map using fold
let myMap f lst =
    List.foldBack (fun x acc -> (f x) :: acc) lst []

let doubled = myMap (fun x -> x * 2) numbers  // [2; 4; 6; 8; 10]

// Implementing filter using fold
let myFilter predicate lst =
    List.foldBack (fun x acc -> if predicate x then x :: acc else acc) lst []

let evens = myFilter (fun x -> x % 2 = 0) numbers  // [2; 4]

🎯 Key Takeaway: If you can express your collection processing logic as map, filter, or a combination of both, use them—they're clearer and more idiomatic. Reserve fold for operations that genuinely need custom accumulation logic, like computing sums, building complex structures, or stateful transformations.

Practical Composition and Pipelines

The real power of higher-order functions emerges when you compose them into pipelines using F#'s pipe operator |>. This operator takes a value on the left and passes it as the last argument to the function on the right: x |> f is equivalent to f x. Chaining pipes creates readable, top-to-bottom data transformation pipelines that mirror how you think about processing data: "take this data, then do this, then do that." This style eliminates nested function calls and intermediate variables, making code more declarative and easier to understand. For example, instead of List.map f2 (List.filter p (List.map f1 data)), you write data |> List.map f1 |> List.filter p |> List.map f2.

Common pipeline patterns include map-filter-map (transform, select, transform again), filter-map-fold (select, transform, aggregate), and collect-map (flatten nested structures, then transform). The List.collect function is particularly useful: it's like map but expects your transformation function to return a list, then flattens all the resulting lists into one. This is perfect for scenarios like expanding a list of sentences into a list of words. Another powerful combinator is List.sortBy, which sorts elements based on a key extraction function. Combining these functions lets you express complex data transformations concisely and declaratively.

// Pipeline examples - composing operations
type Person = { Name: string; Age: int; City: string }

let people = [
    { Name = "Alice"; Age = 30; City = "New York" }
    { Name = "Bob"; Age = 25; City = "London" }
    { Name = "Charlie"; Age = 35; City = "New York" }
    { Name = "Diana"; Age = 28; City = "Paris" }
    { Name = "Eve"; Age = 32; City = "London" }
]

// Get names of people over 28 in London, sorted alphabetically
let londonOver28 =
    people
    |> List.filter (fun p -> p.City = "London" && p.Age > 28)
    |> List.map (fun p -> p.Name)
    |> List.sort
// ["Eve"]

// Calculate average age of people in New York
let avgAgeNY =
    people
    |> List.filter (fun p -> p.City = "New York")
    |> List.map (fun p -> p.Age)
    |> List.average
// 32.5

// Get all words from sentences
let sentences = ["Hello world"; "Functional programming"; "F# is great"]
let allWords =
    sentences
    |> List.collect (fun s -> s.Split(' ') |> Array.toList)
// ["Hello"; "world"; "Functional"; "programming"; "F#"; "is"; "great"]

// Complex pipeline: group people by city and count
let cityCounts =
    people
    |> List.groupBy (fun p -> p.City)
    |> List.map (fun (city, people) -> city, List.length people)
    |> List.sortByDescending snd
// [("New York", 2); ("London", 2); ("Paris", 1)]

📝 Try This: Take a list of integers [1; 2; 3; 4; 5; 6; 7; 8; 9; 10]. Create a pipeline that: (1) filters out odd numbers, (2) squares each remaining number, (3) keeps only those greater than 20, (4) sums the result. Answer: data |> List.filter (fun x -> x % 2 = 0) |> List.map (fun x -> x * x) |> List.filter (fun x -> x > 20) |> List.sum gives 36 + 64 + 100 = 200.

Working with Sequences for Performance

While lists are convenient, sequences (seq<'T>) offer significant performance advantages for large datasets or infinite collections because of their lazy evaluation. Unlike lists, which are fully evaluated and stored in memory, sequences generate elements on-demand. This means you can represent infinite sequences (like all natural numbers) or process gigabytes of data without loading everything into memory. Sequences are perfect for file I/O, database queries, or any scenario where you're processing data streams. All the higher-order functions we've discussed—map, filter, fold—have sequence equivalents in the Seq module that preserve laziness.

A critical difference is that sequence operations don't execute until you force evaluation by converting to a concrete collection (Seq.toList, Seq.toArray) or iterating (Seq.iter). This enables query optimization: F# can compose multiple sequence operations and execute them in a single pass rather than creating intermediate collections. For example, seq { 1..1000000 } |> Seq.filter (fun x -> x % 2 = 0) |> Seq.map (fun x -> x * x) |> Seq.take 10 |> Seq.toList only processes enough elements to get 10 results, never materializing a million-element list. Use sequences when working with large or infinite data, I/O operations, or when you want maximum performance through lazy evaluation.

// Sequence examples - lazy evaluation

// Infinite sequence of Fibonacci numbers
let fibonacci = 
    Seq.unfold (fun (a, b) -> Some(a, (b, a + b))) (0, 1)

let first10Fibs = fibonacci |> Seq.take 10 |> Seq.toList
// [0; 1; 1; 2; 3; 5; 8; 13; 21; 34]

// Lazy file processing - doesn't load entire file into memory
let processLargeFile filename =
    System.IO.File.ReadLines(filename)  // Returns seq<string>
    |> Seq.filter (fun line -> line.StartsWith("ERROR"))
    |> Seq.map (fun line -> line.ToUpper())
    |> Seq.take 100  // Only process first 100 error lines
    |> Seq.toList

// Cache a sequence to avoid recomputation
let expensiveSeq = 
    seq { 1..1000 }
    |> Seq.map (fun x -> 
        printfn "Computing %d" x  // Side effect shows when evaluation happens
        x * x)
    |> Seq.cache  // Cache prevents recomputation on second iteration

let first5 = expensiveSeq |> Seq.take 5 |> Seq.toList  // Computes 1-5
let next5 = expensiveSeq |> Seq.skip 5 |> Seq.take 5 |> Seq.toList  // Computes 6-10
let first5Again = expensiveSeq |> Seq.take 5 |> Seq.toList  // Uses cache, no recomputation

Common Patterns and Best Practices

Several patterns emerge when working with collections and higher-order functions in F#. First, prefer immutable operations: use List.map and List.filter rather than imperative loops that mutate state. This makes code more predictable, easier to test, and naturally thread-safe. Second, leverage partial application to create reusable, specialized functions. For example, let filterEvens = List.filter (fun x -> x % 2 = 0) creates a function you can reuse: let evens1 = filterEvens list1; let evens2 = filterEvens list2. Third, use pipelines for clarity: chains of transformations are easier to understand than deeply nested function calls or complex single functions.

Be mindful of performance characteristics: lists have O(n) length calculation and index access, while arrays are O(1). When you need to repeatedly access elements by index or frequently calculate length, consider using arrays despite their mutability. For large datasets or I/O, always use sequences to avoid memory issues. Another important pattern is early filtering: place filter operations as early as possible in pipelines to reduce the amount of data flowing through subsequent transformations. Finally, avoid premature optimization: start with clear, idiomatic code using map, filter, and fold, then optimize specific bottlenecks if profiling reveals performance issues.

Collection Type Decision Tree:

                    Start
                      |
          ┌───────────┴───────────┐
    Infinite or        Known finite
    very large?         dataset?
          │                  │
         Yes                No
          │                  │
      Use seq<'T>    ┌───────┴────────┐
   (lazy evaluation) │                │
                Functional     Need frequent
                style?         index access?
                   │               │
                  Yes             Yes
                   │               │
              Use List         Use Array
           (immutable)        (mutable)

🔍 Deep Dive: The reason fold can implement any collection operation is that it represents the fundamental recursion pattern: process the first element, combine it with the result of processing the rest. Both map and filter are just special cases where the "combining" function either transforms elements or conditionally includes them. This insight—that complex operations decompose into simple, composable functions—is the essence of functional programming.

📚 Further Study