Iteration Inside Out

Inside Python's Iteration Protocol

Naomi Ceder (@naomiceder)

  • Chair, Python Software Foundation
  • Quick Python Book, 3rd ed
  • Dick Blick Art Materials

"Python's most powerful useful feature"

-- Dave Beazley, "Iterations of Evolution: The Unauthorized Biography of the For-Loop"

Abstract

Using for loops and list comprehensions in Python is basic and quite common, right? But how does iteration in Python actually work “under the hood”? The words “iterator” and “iterable” each occur over 500 times in the Python documentation, but what does an iterator actually do, as opposed to an iterable? And how do they do it? Learn the details as we turn the iteration protocol inside out, with live coded demonstrations along the way.

This talk will start from the way Python iterates of over a sequence, in comparison with iterating by index, like C. The key point of iterating over a sequence is that something needs to track which item in the sequence is next, which is something that Python’s iteration protocol manages.

The iterable section will demonstrate creating a simple object that returns items by index (e.g., a fibonacci series), showing that getitem is really all you need for an iterable, since an iterator is created for such objects when iterated upon. BUT, this doesn’t answer the question of how Python keeps track of which item is next.

The iterator section answers that question by converting the iterable just created to an iterator - adding iter and next methods and showing how the iterator saves state and essentially drives the iteration protocol.

Having an accurate understanding of iteration protocol will help developing Pythonistas reason better about both iterating over existing objects and creating their own iterables and iterators.

Repetition with code and data

Repetitive collections/series of data are all around us

Consider the following:

  • temperature readings for a month
  • dictionary keys mapping member ID's to members
  • a CSV file of a million products
  • the text of Moby Dick
  • a result set for a database query for yesterday's sales

They don't have a lot in common

  • different types of containers/series
  • different types of items

But...

All are series of items where we might want to look at one item after another.

Which means...

In Python we'd normally use a for loop to access each element ...

for temp in temp_readings:
    print(temp)

or a comprehension

all_ids = [cust_id for cust_id in customers]

or a generator expression

product_gen = 
    (product for product in
    csv.reader(open("product_file.csv")))

Obvious, right?

It wasn't always so obvious...

It used to be surprising

Python and for loops

The for statement in Python differs a bit from what you may be used to in C or Pascal. Rather than always iterating over an arithmetic progression of numbers (like in Pascal), or leaving the user completely free in the iteration test and step (as C), Python's for statement iterates over the items of any sequence (e.g., a list or a string), in the order that they appear in the sequence.

-- Python V 1.1 Docs, 1994

A for loop (C style)

  for (int i=0; i < list_len; i++){
    printf("%d\n", a_list[i]);
  }

Which is really just short for:

  int i = 0;
  while (i < list_len){
    printf("%d\n", a_list[i]);
    i++;
  }

Drawbacks

  • Doesn't work so well with files or streams
  • Requires index access
  • Only slightly less bug-prone than the while version

Note: These days many languages have a similar for loop

  • C (macro)
  • C++ (since C++11, ~2012)
  • Javascript (sort of)
  • Java (since Java 5, ~2004)
  • Go
  • Rust

Something similar in Python

In [1]:
# for loop  (C style)
a_list = [1, 2, 3, 4]

for i in range(len(a_list)):
    print(a_list[i])
1
2
3
4

Except it's not the same - Python is generating a range object (another series) and iterating over it to get the index values

The Pythonic for Loop

In [2]:
# for loop (Python style)
a_list = [1, 2, 3, 4]

for item in a_list:
    print(item)
1
2
3
4

And it works the same for different types

  • for key in a_dictionary:
  • for char in a_string:
  • for record in query_results:
  • for line in a_file:

etc...

How does that work?

  • How does a for loop know the “next” item?
  • How can for loops use so many different types?
  • What makes an object “work” in a for loop?

Iteration protocol

  • iteration in Python relies on a protocol, not types (from Python 2.2)
  • It's a good example of Python's “duck typing” - anything that follows the protocol can be iterated over

Iteration Protocol:

  • for iteration you need an iterable object
  • and an iterator (which Python usually handles for you)

iterable

An object capable of returning its members one at a time. Examples of iterables include all sequence types (such as list, str, and tuple) and some non-sequence types like dict, file objects, and objects of any classes you define with an __iter__() method or with a __getitem__() method that implements Sequence semantics.

Iterables can be used in a for loop and in many other places where a sequence is needed (zip(), map(), …). When an iterable object is passed as an argument to the built-in function iter(), it returns an iterator for the object. This iterator is good for one pass over the set of values. When using iterables, it is usually not necessary to call iter() or deal with iterator objects yourself. The for statement does that automatically for you, creating a temporary unnamed variable to hold the iterator for the duration of the loop. See also iterator, sequence, and generator.

--Python glossary

Iterable

  • returns members one at a time
  • e.g, list, str, tuple (sequence types)
  • any class with __iter__() method that returns iterator
  • or any class with __getitem__() with sequence semantics
  • for statement creates an unnamed iterator from iterable automatically

An iterable...

must return an iterator when the iter() function is called on it.

There are 2 ways an object can return a iterator - it can

  • have a __getitem__() method with Sequence semantics - i.e., access items by integer index in [ ].
  • implement an __iter__() method that returns an iterator (more on this soon)

Repetitive collections/series of data Iterables

  • lists (arrays), tuples
  • strings
  • dictionary keys/items
  • sets
  • files
  • database query results
  • etc

Is it an iterable?

  • Does it have an __iter__() method?
In [3]:
# check with hasattr
a_list = [1, 2, 3, 4]

hasattr(a_list, "__iter__")
Out[3]:
True
  • Does it have __getitem__() that is sequence compliant? (harder to decide)

EAFP - Easier to Ask for Forgiveness than Permission

i.e, does calling iter() on it return an iterator? or an exception?

In [ ]:
is_it_iterable = ["asd", 1,  open("Iteration Inside Out.ipynb"), {"one":1, "two":2}]

for item in is_it_iterable:

    try:
        an_iterator = iter(item)
    except TypeError as e:
        print(f"Not Iterable: {e}\n")
    else:
        print(f"Iterable: {an_iterator} is type({an_iterator})\n")

Let’s make an iterable - Repeater

A object that can be iterated over and returns the same value for the specified number of times.

repeat = Repeater("hello", 4)

for i in repeat:
    print(i)

hello
hello
hello
hello

As an iterable, using __getitem()__

In [4]:
class Repeater:
    def __init__(self, value, limit):
        self.value = value
        self.limit = limit
        
    def __getitem__(self, index):
        if 0 <= index < self.limit:
            return self.value
        else:
            raise IndexError
            
    
    
In [5]:
repeat = Repeater("hello", 4)

# does it have an __iter__ method?
hasattr(repeat, "__iter__")
Out[5]:
False
In [6]:
# __getitem__ with sequence semantics?

repeat[0]
Out[6]:
'hello'
In [7]:
# can the iter() function return an iterator?

iter(repeat)
Out[7]:
<iterator at 0x7fda6c4f99e8>
In [10]:
# for loop

for item in repeat:
    print(item)
hello
hello
hello
hello
In [11]:
# list comprehension

[x for x in repeat]
Out[11]:
['hello', 'hello', 'hello', 'hello']

Behind the scenes

  • an iterator is being created from the repeat object
  • it can return the items using integer indexes starting from 0
  • it continues until an IndexError is thrown
  • each time it is iterated on a new iterator is created and it starts from the beginning
In [ ]:
class Repeater:
    def __init__(self, value, limit):
        self.value = value
        self.limit = limit
        
    def __getitem__(self, index):      # The bit we need for an iterable
        if 0 <= index < self.limit:
            return self.value
        else:
            raise IndexError      # only needed if we want iteration to end

Yes, it's really that simple...

  • ONLY the __getitem__() method was needed
  • an IndexError is needed to end iteration

But... what IS an Iterator?

The Python for loop relies on being able to get a next item, but...

  • the iterable doesn't know which item is next
  • the loop itself doesn't care exactly where in the series that item is (or what type it is)
  • the loop relies on the iterator to keep track of what's next
  • any object that can do that can be iterated over, i.e., it is an iterator

An iterator has a __next__() method (in Python 2 next()) that tracks and returns the next item in the series, and you use the next() function to return the next item for iteration.

Iterator

  • has __next__() method
  • calls to __next__() method (next() function) return successive items
  • raises StopIteration when no more data
  • further calls just raise StopIteration
  • must have __iter__() method, which returns self
  • iterators are therefore iterables
  • once exhausted they do not “refresh”

iterator

An object representing a stream of data. Repeated calls to the iterator’s __next__() method (or passing it to the built-in function next()) return successive items in the stream. When no more data are available a StopIteration exception is raised instead. At this point, the iterator object is exhausted and any further calls to its __next__() method just raise StopIteration again...

...Iterators are required to have an __iter__() method that returns the iterator object itself so every iterator is also iterable and may be used in most places where other iterables are accepted. One notable exception is code which attempts multiple iteration passes. A container object (such as a list) produces a fresh new iterator each time you pass it to the iter() function or use it in a for loop. Attempting this with an iterator will just return the same exhausted iterator object used in the previous iteration pass, making it appear like an empty container.

--Python glossary

Let’s make a iterator - RepeatIterator

  • implement __next__() method to return next item
  • implement __iter__() method to return itself
In [17]:
class RepeatIterator:
    def __init__(self, value, limit):
        self.value = value
        self.limit = limit
        self.count = 0
        
    def __next__(self):  
        if self.count < self.limit:
            self.count += 1
            return self.value
        else:
            raise StopIteration
            
    def __iter__(self):
        return self
In [ ]:
repeat_iter = RepeatIterator("Hi", 4)

# __getitem__ with sequence semantics?
In [23]:
 repeat_iter = RepeatIterator("Hi", 4) 
# does it have an __iter__ method?
 hasattr(repeat_iter, "__iter__")
Out[23]:
True
In [15]:
# does it return next item using next() function?

next(repeat_iter)
Out[15]:
'Hi'
In [24]:
# calling iter on it, returns object itself
print(repeat_iter)

repeat_iter_iter = iter(repeat_iter)
print(repeat_iter_iter)
<__main__.RepeatIterator object at 0x7fda6c4a4a20>
<__main__.RepeatIterator object at 0x7fda6c4a4a20>
In [ ]:
# calling iter() on iterable always returns new iterator
print(id(repeat))
old_repeat_iter = iter(repeat)
print(id(old_repeat_iter))
In [25]:
# after 1 next(), how many repetitions left?


for item in repeat_iter:
    print(item) 
Hi
Hi
Hi
Hi
In [26]:
# Let's loop again

for item in repeat_iter:
    print(item)
In [27]:
# one more next?
next(repeat_iter)
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-27-ef48f653ec79> in <module>()
      1 # one more next?
----> 2 next(repeat_iter)

<ipython-input-17-f8683e3a96ec> in __next__(self)
     10             return self.value
     11         else:
---> 12             raise StopIteration
     13 
     14     def __iter__(self):

StopIteration: 

So making an iterator is pretty easy, too...“

  • __next__() method
  • __iter__() method that returns self
  • “exhaustion” after one pass

(but don't do it)

Making an iterator with a generator function

In [28]:
def repeat_gen(value, limit):
    for i in range(limit):
        yield value


for i in repeat_gen("hi", 4):  # iterator returns itself
    print(i)
hi
hi
hi
hi
In [29]:
# or use a generator expression

value = "hi"
limit = 4

repeat_gen_expr = (value for x in range(limit))


for item in repeat_gen("hi", 4):
    print(item)
hi
hi
hi
hi

Iteration in Python

  • is a protocol (since Python 2.2)
  • requires an iterable to iterate over
  • requires an iterator (often automatically created behind the scenes) to track what's next
  • iterators can be used as iterables, but don't "renew"

Thank you!

?'s

This notebook available at http://projects.naomiceder.tech/talks/iteration-inside-out/

  • naomi.ceder@gmail.com
  • @NaomiCeder