Python: Streaming/Appending to a file
I’ve been playing around with Twitter’s API (via the tweepy library) and due to the rate limiting it imposes I wanted to stream results to a CSV file rather than waiting until my whole program had finished.
I wrote the following program to simulate what I was trying to do:
import csv
import time
with open("rows.csv", "a") as file:
writer = csv.writer(file, delimiter = ",")
end = time.time() + 10
while True:
if time.time() > end:
break
else:
writer.writerow(["mark", "123"])
time.sleep(1)
The program will run for 10 seconds and append one line to 'rows.csv' once a second. Although I have used the 'a' flag in my call to 'open' if I poll that file before the 10 seconds is up it’s empty:
$ date && wc -l rows.csv
Mon 9 Mar 2015 22:54:27 GMT
0 rows.csv
$ date && wc -l rows.csv
Mon 9 Mar 2015 22:54:31 GMT
0 rows.csv
$ date && wc -l rows.csv
Mon 9 Mar 2015 22:54:34 GMT
0 rows.csv
$ date && wc -l rows.csv
Mon 9 Mar 2015 22:54:43 GMT
10 rows.csv
I thought the flushing of the file was completely controlled by the with block but lucky for me there’s actually a https://docs.python.org/2/library/io.html#io.IOBase.flush function which allows me to force writes to the file whenever I want.
Here’s the new and improved sample program:
import csv
import time
with open("rows.csv", "a") as file:
writer = csv.writer(file, delimiter = ",")
end = time.time() + 10
while True:
if time.time() > end:
break
else:
writer.writerow(["mark", "123"])
time.sleep(1)
file.flush()
And if we poll the file while the program’s running:
$ date && wc -l rows.csv
Mon 9 Mar 2015 22:57:36 GMT
14 rows.csv
$ date && wc -l rows.csv
Mon 9 Mar 2015 22:57:37 GMT
15 rows.csv
$ date && wc -l rows.csv
Mon 9 Mar 2015 22:57:40 GMT
18 rows.csv
$ date && wc -l rows.csv
Mon 9 Mar 2015 22:57:45 GMT
20 rows.csv
Much easier than I expected - I ♥ python!
About the author
I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.