This document discusses iterators and generators in Python. It begins with an example using Django queries to demonstrate laziness. It then explains that QuerySets are lazy and do not perform database access until their results are iterated over. The document discusses how for loops handle iteration in different languages like C, Python, Java. It defines what makes an object iterable in Python and covers built-in iterable types and functions. It also discusses iterator design patterns and how Python implements iteration through protocols like sequences that require methods like __len__ and __getitem__.
3. >>> from django.db import connection
>>> q = connection.queries
>>> q
[]
>>> from municipios.models import *
>>> res = Municipio.objects.all()[:5]
>>> q
[]
>>> for m in res: print m.uf, m.nome
...
GO Abadia de Goiás
MG Abadia dos Dourados
GO Abadiânia
MG Abaeté
PA Abaetetuba
>>> q
[{'time': '0.000', 'sql': u'SELECT "municipios_municipio"."id",
"municipios_municipio"."uf", "municipios_municipio"."nome",
"municipios_municipio"."nome_ascii", "municipios_municipio"."meso_regiao_id",
"municipios_municipio"."capital", "municipios_municipio"."latitude",
"municipios_municipio"."longitude", "municipios_municipio"."geohash" FROM
"municipios_municipio" ORDER BY "municipios_municipio"."nome_ascii" ASC LIMIT 5'}]
4. >>> from django.db import connection
>>> q = connection.queries
>>> q
[]
>>> from municipios.models import *
>>> res = Municipio.objects.all()[:5]
>>> q
[]
>>> for m in res: print m.uf, m.nome
...
GO Abadia de Goiás
MG Abadia dos Dourados
GO Abadiânia
MG Abaeté
PA Abaetetuba
>>> q
[{'time': '0.000', 'sql': u'SELECT "municipios_municipio"."id",
"municipios_municipio"."uf", "municipios_municipio"."nome",
"municipios_municipio"."nome_ascii", "municipios_municipio"."meso_regiao_id",
"municipios_municipio"."capital", "municipios_municipio"."latitude",
"municipios_municipio"."longitude", "municipios_municipio"."geohash" FROM
"municipios_municipio" ORDER BY "municipios_municipio"."nome_ascii" ASC LIMIT 5'}]
this expression makes
a Django QuerySet
5. >>> from django.db import connection
>>> q = connection.queries
>>> q
[]
>>> from municipios.models import *
>>> res = Municipio.objects.all()[:5]
>>> q
[]
>>> for m in res: print m.uf, m.nome
...
GO Abadia de Goiás
MG Abadia dos Dourados
GO Abadiânia
MG Abaeté
PA Abaetetuba
>>> q
[{'time': '0.000', 'sql': u'SELECT "municipios_municipio"."id",
"municipios_municipio"."uf", "municipios_municipio"."nome",
"municipios_municipio"."nome_ascii", "municipios_municipio"."meso_regiao_id",
"municipios_municipio"."capital", "municipios_municipio"."latitude",
"municipios_municipio"."longitude", "municipios_municipio"."geohash" FROM
"municipios_municipio" ORDER BY "municipios_municipio"."nome_ascii" ASC LIMIT 5'}]
this expression makes
a Django QuerySet
QuerySets are “lazy”:
no database access
so far
6. >>> from django.db import connection
>>> q = connection.queries
>>> q
[]
>>> from municipios.models import *
>>> res = Municipio.objects.all()[:5]
>>> q
[]
>>> for m in res: print m.uf, m.nome
...
GO Abadia de Goiás
MG Abadia dos Dourados
GO Abadiânia
MG Abaeté
PA Abaetetuba
>>> q
[{'time': '0.000', 'sql': u'SELECT "municipios_municipio"."id",
"municipios_municipio"."uf", "municipios_municipio"."nome",
"municipios_municipio"."nome_ascii", "municipios_municipio"."meso_regiao_id",
"municipios_municipio"."capital", "municipios_municipio"."latitude",
"municipios_municipio"."longitude", "municipios_municipio"."geohash" FROM
"municipios_municipio" ORDER BY "municipios_municipio"."nome_ascii" ASC LIMIT 5'}]
this expression makes
a Django QuerySet
QuerySets are “lazy”:
no database access
so far
the query is made
only when we
iterate over the
results
9. @ramalhoorg
Lazy
• Avoids unnecessary work, by postponing it as long as possible
• The opposite of eager
9
In Computer Science, being
“lazy” is often a good thing!
11. @ramalhoorg
Iteration: C and Python
#include <stdio.h>
int main(int argc, char *argv[]) {
int i;
for(i = 0; i < argc; i++)
printf("%sn", argv[i]);
return 0;
}
import sys
for arg in sys.argv:
print arg
12. @ramalhoorg
Iteration: Java (classic)
class Arguments {
public static void main(String[] args) {
for (int i=0; i < args.length; i++)
System.out.println(args[i]);
}
}
$ java Arguments alfa bravo charlie
alfa
bravo
charlie
13. @ramalhoorg
Iteration: Java ≥1.5
$ java Arguments2 alfa bravo charlie
alfa
bravo
charlie
• Enhanced for (a.k.a. foreach)
since
2004
class Arguments2 {
public static void main(String[] args) {
for (String arg : args)
System.out.println(arg);
}
}
14. @ramalhoorg
Iteration: Java ≥1.5
• Enhanced for (a.k.a. foreach)
class Arguments2 {
public static void main(String[] args) {
for (String arg : args)
System.out.println(arg);
}
}
since
2004
import sys
for arg in sys.argv:
print arg
since
1991
15. @ramalhoorg
You can iterate over many
Python objects
• strings
• files
• XML: ElementTree nodes
• not limited to built-in types:
• Django QuerySet
• etc.
15
16. @ramalhoorg
So, what is an iterable?
• Informal, recursive definition:
• iterable: fit to be iterated
• just as:
edible: fit to be eaten
16
18. List comprehension
● Compreensão de lista ou abrangência
● Exemplo: usar todos os elementos:
– L2 = [n*10 for n in L]
List comprehension
• An expression that builds a list from any iterable
>>> s = 'abracadabra'
>>> l = [ord(c) for c in s]
>>> l
[97, 98, 114, 97, 99, 97, 100, 97, 98, 114, 97]
input: any iterable object
output: a list (always)
19. @ramalhoorg
Set comprehension
• An expression that builds a set from any iterable
>>> s = 'abracadabra'
>>> set(s)
{'b', 'r', 'a', 'd', 'c'}
>>> {ord(c) for c in s}
{97, 98, 99, 100, 114}
19
20. @ramalhoorg
Dict comprehensions
• An expression that builds a dict from any iterable
>>> s = 'abracadabra'
>>> {c:ord(c) for c in s}
{'a': 97, 'r': 114, 'b': 98, 'c': 99, 'd': 100}
20
21. @ramalhoorg
Syntactic support for iterables
• Tuple unpacking,
parallel assignment
>>> a, b, c = 'XYZ'
>>> a
'X'
>>> b
'Y'
>>> c
'Z'
21
>>> l = [(c, ord(c)) for c in 'XYZ']
>>> l
[('X', 88), ('Y', 89), ('Z', 90)]
>>> for char, code in l:
... print char, '->', code
...
X -> 88
Y -> 89
Z -> 90
22. @ramalhoorg
Syntactic support for iterables (2)
• Function calls: exploding arguments with *
>>> import math
>>> def hypotenuse(a, b):
... return math.sqrt(a*a + b*b)
...
>>> hypotenuse(3, 4)
5.0
>>> sides = (3, 4)
>>> hypotenuse(sides)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: hypotenuse() takes exactly 2 arguments (1 given)
>>> hypotenuse(*sides)
5.0 22
24. @ramalhoorg
Built-in functions that take iterable
arguments
• all
• any
• filter
• iter
• len
• map
• max
• min
• reduce
• sorted
• sum
• zip
unrelated to compression
28. @ramalhoorg
Head First Design
Patterns Poster
O'Reilly,
ISBN 0-596-10214-3
28
“The Iterator Pattern
provides a way to access
the elements of an
aggregate object
sequentially without
exposing the underlying
representation.”
29. An iterable Train class
>>> train = Train(4)
>>> for car in train:
... print(car)
car #1
car #2
car #3
car #4
>>>
31. @ramalhoorg
Iterable ABC
• collections.Iterable abstract base class
• A concrete subclass of Iterable must
implement .__iter__
• .__iter__ returns an Iterator
• You don’t usually call .__iter__ directly
• when needed, call iter(x)
31
32. @ramalhoorg
Iterator ABC
• Iterator provides
.next
or
.__next__
• .__next__ returns the next item
• You don’t usually call .__next__ directly
• when needed, call next(x)
Python 3
Python 2
Python ≥ 2.6
32
33. @ramalhoorg
for car in train:
• calls iter(train) to
obtain a TrainIterator
• makes repeated calls to
next(aTrainIterator)
until it raises StopIteration
class Train(object):
def __init__(self, cars):
self.cars = cars
def __len__(self):
return self.cars
def __iter__(self):
return TrainIterator(self)
class TrainIterator(object):
def __init__(self, train):
self.train = train
self.current = 0
def __next__(self): # Python 3
if self.current < len(self.train):
self.current += 1
return 'car #%s' % (self.current)
else:
raise StopIteration()
Train with
iterator
1
1
2
>>> train = Train(3)
>>> for car in train:
... print(car)
car #1
car #2
car #3
2
36. @ramalhoorg
Design patterns in dynamic
languages
• Dynamic languages: Lisp, Smalltalk, Python,
Ruby, PHP, JavaScript...
• Many features not found in C++, where most
of the original 23 Design Patterns were
identified
• Java is more dynamic than C++, but much
more static than Lisp, Python etc.
36
Gamma, Helm, Johnson, Vlissides
a.k.a. the Gang of Four (GoF)
38. @ramalhoorg
Dynamic types
• No need to declare types or interfaces
• It does not matter what an object claims do be, only what it is
capable of doing
38
39. @ramalhoorg
Duck typing
39
“In other words, don't
check whether it is-a duck:
check whether it quacks-
like-a duck, walks-like-a
duck, etc, etc, depending
on exactly what subset of
duck-like behaviour you
need to play your
language-games with.”
Alex Martelli
comp.lang.python (2000)
40. @ramalhoorg
A Python iterable is...
• An object from which the iter function can produce an iterator
• The iter(x) call:
• invokes x.__iter__() to obtain an iterator
• but, if x has no __iter__:
• iter makes an iterator which tries to fetch items from x by doing
x[0], x[1], x[2]...
sequence protocol
Iterable interface
40
42. Train: a sequence of cars
>>> train = Train(4)
>>> len(train)
4
>>> train[0]
'car #1'
>>> train[3]
'car #4'
>>> train[-1]
'car #4'
>>> train[4]
Traceback (most recent call last):
...
IndexError: no car at 4
>>> for car in train:
... print(car)
car #1
car #2
car #3
car #4
43. Train: a sequence of cars
class Train(object):
def __init__(self, cars):
self.cars = cars
def __getitem__(self, key):
index = key if key >= 0 else self.cars + key
if 0 <= index < len(self): # index 2 -> car #3
return 'car #%s' % (index + 1)
else:
raise IndexError('no car at %s' % key)
if __getitem__ exists,
iteration “just works”
44. @ramalhoorg
The sequence protocol at work
>>> t = Train(4)
>>> len(t)
4
>>> t[0]
'car #1'
>>> t[3]
'car #4'
>>> t[-1]
'car #4'
>>> for car in t:
... print(car)
car #1
car #2
car #3
car #4
__len__
__getitem__
__getitem__
45. @ramalhoorg
Protocol
• protocol: a synonym for interface used in dynamic
languages like Smalltalk, Python, Ruby, Lisp...
• not declared, and not enforced by static checks
45
46. class Train(object):
def __init__(self, cars):
self.cars = cars
def __len__(self):
return self.cars
def __getitem__(self, key):
index = key if key >= 0 else self.cars + key
if 0 <= index < len(self): # index 2 -> car #3
return 'car #%s' % (index + 1)
else:
raise IndexError('no car at %s' % key)
Sequence protocol
__len__ and __getitem__
implement the immutable
sequence protocol
47. import collections
class Train(collections.Sequence):
def __init__(self, cars):
self.cars = cars
def __len__(self):
return self.cars
def __getitem__(self, key):
index = key if key >= 0 else self.cars + key
if 0 <= index < len(self): # index 2 -> car #3
return 'car #%s' % (index + 1)
else:
raise IndexError('no car at %s' % key)
Sequence ABC
• collections.Sequence abstract base class
abstract
methods
Python ≥ 2.6
48. import collections
class Train(collections.Sequence):
def __init__(self, cars):
self.cars = cars
def __len__(self):
return self.cars
def __getitem__(self, key):
index = key if key >= 0 else self.cars + key
if 0 <= index < len(self): # index 2 -> car #3
return 'car #%s' % (index + 1)
else:
raise IndexError('no car at %s' % key)
Sequence ABC
• collections.Sequence abstract base class
implement
these 2
49. import collections
class Train(collections.Sequence):
def __init__(self, cars):
self.cars = cars
def __len__(self):
return self.cars
def __getitem__(self, key):
index = key if key >= 0 else self.cars + key
if 0 <= index < len(self): # index 2 -> car #3
return 'car #%s' % (index + 1)
else:
raise IndexError('no car at %s' % key)
Sequence ABC
• collections.Sequence abstract base class
inherit these 5
50. @ramalhoorg
Sequence ABC
• collections.Sequence abstract base class
>>> train = Train(4)
>>> 'car #2' in train
True
>>> 'car #7' in train
False
>>> for car in reversed(train):
... print(car)
car #4
car #3
car #2
car #1
>>> train.index('car #3')
2 50
53. @ramalhoorg
Iteration in C (example 2)
#include <stdio.h>
int main(int argc, char *argv[]) {
int i;
for(i = 0; i < argc; i++)
printf("%d : %sn", i, argv[i]);
return 0;
}
$ ./args2 alfa bravo charlie
0 : ./args2
1 : alfa
2 : bravo
3 : charlie
54. @ramalhoorg
Iteration in Python (ex. 2)
import sys
for i in range(len(sys.argv)):
print i, ':', sys.argv[i]
$ python args2.py alfa bravo charlie
0 : args2.py
1 : alfa
2 : bravo
3 : charlie 54
not Pythonic
55. @ramalhoorg
Iteration in Python (ex. 2)
import sys
for i, arg in enumerate(sys.argv):
print i, ':', arg
$ python args2.py alfa bravo charlie
0 : args2.py
1 : alfa
2 : bravo
3 : charlie 55
Pythonic!
56. @ramalhoorg
import sys
for i, arg in enumerate(sys.argv):
print i, ':', arg
Iteration in Python (ex. 2)
$ python args2.py alfa bravo charlie
0 : args2.py
1 : alfa
2 : bravo
3 : charlie
this returns a lazy
iterable object
that object yields tuples
(index, item)
on demand, at each
iteration
56
58. @ramalhoorg
What
enumerate does
isso constroi
um gerador
and that is iterable
>>> e = enumerate('Turing')
>>> e
<enumerate object at 0x...>
>>> for item in e:
... print item
...
(0, 'T')
(1, 'u')
(2, 'r')
(3, 'i')
(4, 'n')
(5, 'g')
>>>
58
enumerate builds an
enumerate object
59. @ramalhoorg
What
enumerate does
isso constroi
um gerador
the enumerate object
produces an
(index, item) tuple
for each next(e) call
>>> e = enumerate('Turing')
>>> e
<enumerate object at 0x...>
>>> next(e)
(0, 'T')
>>> next(e)
(1, 'u')
>>> next(e)
(2, 'r')
>>> next(e)
(3, 'i')
>>> next(e)
(4, 'n')
>>> next(e)
(5, 'g')
>>> next(e)
Traceback (most recent...):
...
StopIteration
• The enumerator object is an
example of a generator
60. @ramalhoorg
Iterator x generator
• By definition (in GoF) an iterator retrieves successive items
from an existing collection
• A generator implements the iterator interface (next) but
produces items not necessarily in a collection
• a generator may iterate over a collection, but return the items
decorated in some way, skip some items...
• it may also produce items independently of any existing data
source (eg. Fibonacci sequence generator)
60
63. @ramalhoorg
Generator
function
• Any function that has the
yield keyword in its body
is a generator function
63
>>> def gen_123():
... yield 1
... yield 2
... yield 3
...
>>> for i in gen_123(): print(i)
1
2
3
>>>
the keyword gen was considered
for defining generator functions,
but def prevailed
64. @ramalhoorg
• When invoked, a
generator function
returns a
generator object
Generator
function
64
>>> def gen_123():
... yield 1
... yield 2
... yield 3
...
>>> for i in gen_123(): print(i)
1
2
3
>>> g = gen_123()
>>> g
<generator object gen_123 at ...>
66. @ramalhoorg
Generator
behavior
• Note how the output of
the generator function is
interleaved with the
output of the calling code
66
>>> def gen_AB():
... print('START')
... yield 'A'
... print('CONTINUE')
... yield 'B'
... print('END.')
...
>>> for c in gen_AB():
... print('--->', c)
...
START
---> A
CONTINUE
---> B
END.
>>>
67. @ramalhoorg
Generator
behavior
• The body is executed only
when next is called, and it
runs only up to the following
yield
>>> def gen_AB():
... print('START')
... yield 'A'
... print('CONTINUE')
... yield 'B'
... print('END.')
...
>>> g = gen_AB()
>>> next(g)
START
'A'
>>>
68. @ramalhoorg
Generator
behavior
• When the body of the function
returns, the generator object
throws StopIteration
• The for statement catches
that for you
68
>>> def gen_AB():
... print('START')
... yield 'A'
... print('CONTINUE')
... yield 'B'
... print('END.')
...
>>> g = gen_AB()
>>> next(g)
START
'A'
>>> next(g)
CONTINUE
'B'
>>> next(g)
END.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
69. for car in train:
• calls iter(train) to
obtain a generator
• makes repeated calls to next(generator) until
the function returns, which raises StopIteration
class Train(object):
def __init__(self, cars):
self.cars = cars
def __iter__(self):
for i in range(self.cars):
# index 2 is car #3
yield 'car #%s' % (i+1)
Train with
generator
function 1
1
2
>>> train = Train(3)
>>> for car in train:
... print(car)
car #1
car #2
car #3
2
70. Classic iterator
x generator
class Train(object):
def __init__(self, cars):
self.cars = cars
def __len__(self):
return self.cars
def __iter__(self):
return TrainIterator(self)
class TrainIterator(object):
def __init__(self, train):
self.train = train
self.current = 0
def __next__(self): # Python 3
if self.current < len(self.train):
self.current += 1
return 'car #%s' % (self.current)
else:
raise StopIteration()
class Train(object):
def __init__(self, cars):
self.cars = cars
def __iter__(self):
for i in range(self.cars):
yield 'car #%s' % (i+1)
2 classes,
12 lines of code
1 class,
3 lines of code
71. class Train(object):
def __init__(self, cars):
self.cars = cars
def __iter__(self):
for i in range(self.cars):
yield 'car #%s' % (i+1)
The pattern
just vanished
72. class Train(object):
def __init__(self, cars):
self.cars = cars
def __iter__(self):
for i in range(self.cars):
yield 'car #%s' % (i+1)
“When I see patterns in my
programs, I consider it a sign of
trouble. The shape of a program
should reflect only the problem it
needs to solve. Any other regularity
in the code is a sign, to me at least,
that I'm using abstractions that
aren't powerful enough -- often that
I'm generating by hand the
expansions of some macro that I
need to write.”
Paul Graham
Revenge of the nerds (2002)
73. Generator expression (genexp)
>>> g = (c for c in 'ABC')
>>> g
<generator object <genexpr> at 0x10045a410>
>>> for l in g:
... print(l)
...
A
B
C
>>>
74. @ramalhoorg
• When evaluated,
returns a generator object
>>> g = (n for n in [1, 2, 3])
>>> g
<generator object <genexpr> at 0x...>
>>> next(g)
1
>>> next(g)
2
>>> next(g)
3
>>> next(g)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
Generator expression (genexp)
75. for car in train:
• calls iter(train) to
obtain a generator
• makes repeated calls to next(generator) until
the function returns, which raises StopIteration
class Train(object):
def __init__(self, cars):
self.cars = cars
def __iter__(self):
for i in range(self.cars):
# index 2 is car #3
yield 'car #%s' % (i+1)
Train with
generator
function 1
1
2
>>> train = Train(3)
>>> for car in train:
... print(car)
car #1
car #2
car #3
2
76. for car in train:
• calls iter(train) to
obtain a generator
• makes repeated calls to next(generator) until
the function returns, which raises StopIteration
1
2
class Train(object):
def __init__(self, cars):
self.cars = cars
def __iter__(self):
return ('car #%s' % (i+1)
for i in range(self.cars))
Train with generator expression
>>> train = Train(3)
>>> for car in train:
... print(car)
car #1
car #2
car #3
77. class Train(object):
def __init__(self, cars):
self.cars = cars
def __iter__(self):
return ('car #%s' % (i+1) for i in range(self.cars))
Generator function
x genexpclass Train(object):
def __init__(self, cars):
self.cars = cars
def __iter__(self):
for i in range(self.cars):
yield 'car #%s' % (i+1)
78. @ramalhoorg
Built-in functions that return
iterables, iterators or generators
• dict
• enumerate
• frozenset
• list
• reversed
• set
• tuple
78
79. @ramalhoorg
• boundless generators
• count(), cycle(), repeat()
• generators which combine several iterables:
• chain(), tee(), izip(), imap(), product(), compress()...
• generators which select or group items:
• compress(), dropwhile(), groupby(), ifilter(), islice()...
• generators producing combinations of items:
• product(), permutations(), combinations()...
The itertools module Don’t reinvent the
wheel, use itertools!
this was not
reinvented:
ported from
Haskell
great for
MapReduce
80. @ramalhoorg
Generators in Python 3
• Several functions and methods of the standard library that used to
return lists, now return generators and other lazy iterables in Python 3
• dict.keys(), dict.items(), dict.values()...
• range(...)
• like xrange in Python 2.x (more than a generator)
• If you really need a list, just pass the generator to the list constructor.
Eg.: list(range(10))
80
81. @ramalhoorg
A practical example using
generator functions
• Generator functions to decouple reading and writing logic in a database
conversion tool designed to handle large datasets
https://github.com/ramalho/isis2json
81
88. main: determine
input format
selected generator function is passed
as an argument
input generator function is selected
based on the input file extension
90. writeJsonArray:
iterates over one of the input
generator functions
selected generator function received as
an argument...
and called to produce input
generator
98. @ramalhoorg
We did not cover
• other generator methods:
• gen.close(): causes a GeneratorExit exception to be raised
within the generator body, at the point where it is paused
• gen.throw(e): causes any exception e to be raised within the
generator body, at the point it where is paused
Mostly useful for long-running processes.
Often not needed in batch processing scripts.
98
99. @ramalhoorg
We did not cover
• generator delegation with yield from
• sending data into a generator function with the
gen.send(x) method (instead of next(gen)),
and using yield as an expression to get the
data sent
• using generator functions as coroutines
not useful in the
context of iteration
Python ≥ 3.3
“Coroutines are not
related to iteration”
David Beazley
99
100. @ramalhoorg
How to learn generators
• Forget about .send() and coroutines: that is a completely different
subject. Look into that only after mastering and becoming really
confortable using generators for iteration.
• Study and use the itertools module
• Don’t worry about .close() and .throw() initially. You can be
productive with generators without using these methods.
• yield from is only available in Python 3.3, and only relevant if you
need to use .close() and .throw()
100
101. Q & A
Luciano Ramalho
luciano@ramalho.org
@ramalhoorg
https://github.com/ramalho/isis2json