Python: Parsing a JSON HTTP chunking stream
I’ve been playing around with meetup.com’s API again and this time wanted to consume the chunked HTTP RSVP stream and filter RSVPs for events I’m interested in.
I use Python for most of my hacking these days and if HTTP requests are required the requests library is my first port of call.
I started out with the following script
import requests
import json
def stream_meetup_initial():
uri = "http://stream.meetup.com/2/rsvps"
response = requests.get(uri, stream = True)
for chunk in response.iter_content(chunk_size = None):
yield chunk
for raw_rsvp in stream_meetup_initial():
print raw_rsvp
try:
rsvp = json.loads(raw_rsvp)
except ValueError as e:
print e
continue
This mostly worked but I also noticed the following error from time to time:
No JSON object could be decoded
Although less frequent, I also saw errors suggesting I was trying to parse an incomplete JSON object. I tweaked the function to keep a local buffer and only yield that if the chunk ended in a new line character:
def stream_meetup_newline():
uri = "http://stream.meetup.com/2/rsvps"
response = requests.get(uri, stream = True)
buffer = ""
for chunk in response.iter_content(chunk_size = 1):
if chunk.endswith("\n"):
buffer += chunk
yield buffer
buffer = ""
else:
buffer += chunk
This mostly works although I’m sure I’ve seen some occasions where two JSON objects were being yielded and then the call to 'json.loads' failed. I haven’t been able to reproduce that though.
A second read through the requests documentation made me realise I hadn’t read it very carefully the first time since we can make our lives much easier by using 'iter_lines' rather than 'iter_content':
r = requests.get('http://stream.meetup.com/2/rsvps', stream=True)
for raw_rsvp in r.iter_lines():
if raw_rsvp:
rsvp = json.loads(raw_rsvp)
print rsvp
We can then process 'rsvp', filtering out the ones we’re interested in.
About the author
I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.