Python: Joining multiple generators/iterators
In my previous blog post I described how I’d refactored some scraping code I’ve been working on to use iterators and ended up with a function which returned a generator containing all the events for one BBC live text match:
match_id = "32683310"
events = extract_events("data/raw/%s" % (match_id))
>>> print type(events)
<type 'generator'>
The next thing I wanted to do is get the events for multiple matches which meant I needed to glue together multiple generators into one big generator.
itertools' https://docs.python.org/2/library/itertools.html#itertools.chain function does exactly what we want:
itertools.chain(*iterables) Make an iterator that returns elements from the first iterable until it is exhausted, then proceeds to the next iterable, until all of the iterables are exhausted. Used for treating consecutive sequences as a single sequence.
First let’s try it out on a collection of range generators:
import itertools
gens = [(n*2 for n in range(0, 3)), (n*2 for n in range(4,7))]
>>> gens
[<generator object <genexpr> at 0x10ff3b140>, <generator object <genexpr> at 0x10ff7d870>]
output = itertools.chain()
for gen in gens:
output = itertools.chain(output, gen)
Now if we iterate through 'output' we’d expect to see the multiples of 2 up to and including 12:
>>> for item in output:
... print item
...
0
2
4
8
10
12
Exactly as we expected! Our scraping code looks like this once we plug the chaining in:
matches = ["32683310", "32683303", "32384894", "31816155"]
raw_events = itertools.chain()
for match_id in matches:
raw_events = itertools.chain(raw_events, extract_events("data/raw/%s" % (match_id)))
'raw_events' now contains a single generator that we can iterate through and process the events for all matches.
About the author
I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.