Python: Transforming Twitter datetime string to timestamp (z' is a bad directive in format)
I’ve been playing around with importing Twitter data into Neo4j and since Neo4j can’t store dates natively just yet I needed to convert a date string to timestamp.
I started with the following which unfortunately throws an exception:
from datetime import datetime
date = "Sat Mar 14 18:43:19 +0000 2015"
>>> datetime.strptime(date, "%a %b %d %H:%M:%S %z %Y")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/_strptime.py", line 317, in _strptime
(bad_directive, format))
ValueError: 'z' is a bad directive in format '%a %b %d %H:%M:%S %z %Y'
%z is actually a valid option used to extract the timezone but my googling suggests it not working is one of the idiosyncrasies of strptime.
I eventually came across the python-dateutil library, as recommended by Joe Shaw on StackOverflow.
Using that library the problem is suddenly much simpler:
$ pip install python-dateutil
from dateutil import parser
parsed_date = parser.parse(date)
>>> parsed_date
datetime.datetime(2015, 3, 14, 18, 43, 19, tzinfo=tzutc())
To get to a timestamp we can use calendar as I’ve described before:
import calendar
timestamp = calendar.timegm(parser.parse(date).timetuple())
>>> timestamp
1426358599
About the author
I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.