What can you do with generator expressions? PREMIUM

Trey Hunner smiling in a t-shirt against a yellow wall
Trey Hunner
5 min. read 4 min. video Python 3.9—3.13
Python Morsels
Watch as video
04:02

What can you do with generator expressions? In two words: lazy pre-processing.

We basically take the logic that's in a for loop and move some of it before the for loop. This allows us to chunk up our looping logic, which can sometimes make our code a little bit more readable.

Filtering elements using a for loop

We have a tab delimited file here (expenses.tsv) that represents prices of various items:

Item Name       Category    Price
purple duck     rubber duck 1.00
iced tea        food        2.00
unicorn duck    rubber duck 2.00
unicorn duck    rubber duck 3.00
sandwich        food        8.00

We have some code that uses Python's csv module to loop over this file line-by-line, filtering it down to just the rows where the category is "rubber duck":

>>> import csv
>>> with open("expenses.tsv") as expenses_file:
...     duck_total = 0
...     for row in csv.DictReader(expenses_file, delimiter="\t"):
...         if row["Category"] == "rubber duck":
...             duck_total += float(row["Price"])
...

And then we're taking the price from each row, converting it to a number, and adding them up.

We get a total price of 6.0 for all of the rows with a category of "rubber duck":

>>> print("Cost of rubber ducks:", duck_total)
Cost of rubber ducks: 6.0

Using generator expressions for filtering items

The above for loop was for filtering down our rows (by only looking at rows where the category is "rubber duck").

import csv
with open("expenses.tsv") as expenses_file:
    duck_total = 0
    for row in csv.DictReader(expenses_file, delimiter="\t"):
        if row["Category"] == "rubber duck":
            duck_total += float(row["Price"])

Generator expressions are pretty good at filtering. So we could use a generator expression to do the filtering for us.

>>> import csv
>>> with open("expenses.tsv") as expenses_file:
...     duck_rows = (
...         row
...         for row in csv.DictReader(expenses_file, delimiter="\t")
...         if row["Category"] == "rubber duck"
...     )
...     duck_total = 0
...     for row in duck_rows:
...         duck_total += float(row["Price"])
...
>>> print("Cost of rubber ducks:", duck_total)
Cost of rubber ducks: 6.0

We have a generator expression that loops over our DictReader object (just as we were doing in our for loop before):

with open("expenses.tsv") as expenses_file:
    duck_rows = (
        row
        for row in csv.DictReader(expenses_file, delimiter="\t")
        if row["Category"] == "rubber duck"
    )

It checks to see if the category for each row is "rubber duck" and if it is we include that row.

Our for loop now is a little bit shorter now:

    duck_total = 0
    for row in duck_rows:
        duck_total += float(row["Price"])

We've managed to shorten the logic because we stuck some of that logic in our generator expression before the for loop:

>>> import csv
>>> with open("expenses.tsv") as expenses_file:
...     duck_rows = (
...         row
...         for row in csv.DictReader(expenses_file, delimiter="\t")
...         if row["Category"] == "rubber duck"
...     )
...     duck_total = 0
...     for row in duck_rows:
...         duck_total += float(row["Price"])
...

Using generator expressions for mapping items

We could actually take this a little bit further.

The last part of a generator expression does a filter operation: we can put a condition at the end of a generator expression to filter items down, only including items that match a certain condition. The first part of a generator expression does a map operation: it transforms each item that we're including in our new lazy iterable.

We're using this conditional part of our generator expression to filter down our rows by their category. But we're not really using the map part of our generator expression right now.

We're kind of transforming items in our for loop right now: we're taking a row, grabbing its price, and turning it into a number:

...     for row in duck_rows:
...         duck_total += float(row["Price"])

We could do that transformation in our generator expression instead:

>>> import csv
>>> with open("expenses.tsv") as expenses_file:
...     duck_costs = (
...         float(row["Price"])
...         for row in csv.DictReader(expenses_file, delimiter="\t")
...         if row["Category"] == "rubber duck"
...     )
...     duck_total = 0
...     for price in duck_costs:
...         duck_total += price
...
>>> print("Cost of rubber ducks:", duck_total)
Cost of rubber ducks: 6.0

This new generator expression makes a generator object that gives us numbers instead of rows:

    duck_costs = (
        float(row["Price"])
        for row in csv.DictReader(expenses_file, delimiter="\t")
        if row["Category"] == "rubber duck"
    )

So, the for loop we end up with only needs to add these numbers up as we loop over them:

    duck_total = 0
    for price in duck_costs:
        duck_total += price

Data preprocessing using generator expressions

We've simplified our for loop so much that it's actually equivalent to the built-in sum function.

We can take our generator object and pass it directly into the sum function so that it can do the summing for us:

>>> import csv
>>> with open("expenses.tsv") as expenses_file:
...     duck_costs = (
...         float(row["Price"])
...         for row in csv.DictReader(expenses_file, delimiter="\t")
...         if row["Category"] == "rubber duck"
...     )
...     duck_total = sum(duck_costs)
...
>>> print("Cost of rubber ducks:", duck_total)
Cost of rubber ducks: 6.0

And in fact, we could take this even further and embed that generator expression right into a call to the sum function:

>>> import csv
>>> with open("expenses.tsv") as expenses_file:
...     duck_total = sum(
...         float(row["Price"])
...         for row in csv.DictReader(expenses_file, delimiter="\t")
...         if row["Category"] == "rubber duck"
...     )
...
>>> print("Cost of rubber ducks:", duck_total)

Generator expressions are good for:

  1. map operations: they can transform each item as you loop over the new lazy iterable
  2. filter operations: they can filter items down to only include items that match a specific condition within our new lazy iterable

Generator expressions also pair nicely with reduce operations. Our generator expression isn't reducing an iterable of numbers to one number, but the sum function is! The generator expression is pre-processing the data in our iterable before we pass that data to the sum function.

Summary

Generator expressions are good for transforming each item in an iterable into something new. Generator expressions are also good at filtering down items in an iterable to only include ones that match a certain condition.

Those two operations are great for lazily pre-processing data, which usually means taking logic that was within a for loop and moving it outside of the for loop. Sometimes you'll find you can shorten your for loop so much that the loop that you end up with is actually equivalent to some kind of generic aggregation function (like the built-in sum function).

Series: Generator Expressions

List comprehensions make new lists. Generator expressions make new generator objects. Generators are iterators, which are lazy single-use iterables. Unlike lists, generators aren't data structures. Instead they do work as you loop over them.

To track your progress on this Python Morsels topic trail, sign in or sign up.

0%
Python Morsels
Watch as video
04:02
This is a free preview of a premium screencast. You have 2 previews remaining.