neo4j/cypher: Playing around with time
I’ve done a bit of modelling with years and months in neo4j graphs that I’ve worked on previously but I haven’t ever done anything with time so I thought it’d be interesting to have a go with my football graph.
I came across this StackOverflow post on my travels which suggested that indexing nodes by time would be helpful and since I have a bunch of football matches with associated times I thought I’d try it out.
I created the key of the index by running code similar to the following:
> DateTime.now.strftime("%H%M")
=> "2200"
We can then write a query to show all the games at a certain time of day:
START game=node:times('time:*')
RETURN game.time, COUNT(game)
ORDER BY game.time
+-------------------------+
| game.time | COUNT(game) |
+-------------------------+
| 1245 | 21 |
| 1330 | 21 |
| 1500 | 163 |
| 1600 | 29 |
| 1730 | 22 |
| 1945 | 21 |
| 2000 | 19 |
+-------------------------+
7 rows
To be fair any index that referenced all the matches would allow us to do this. e.g.
START game=node:matches('match_id:*')
RETURN game.time, COUNT(game)
ORDER BY game.time
The time based indexing becomes more interesting when we use Lucene’s numeric range query syntax to only select matches which happened between certain times of day:
START game=node:times('time:[1600 TO 2000]')
RETURN game.time, COUNT(game)
ORDER BY game.time
+-------------------------+
| game.time | COUNT(game) |
+-------------------------+
| 1600 | 29 |
| 1730 | 22 |
| 1945 | 21 |
| 2000 | 19 |
+-------------------------+
4 rows
I couldn’t see a way to set an open ended value either side of the 'TO' so if we want to do that we just need to set a really high maximum value or really low minimum value.
For example if we want to find all the evening matches we could use this query:
START game=node:times('time:[1730 TO 2359]')
RETURN game.time, COUNT(game)
ORDER BY game.time
+-------------------------+
| game.time | COUNT(game) |
+-------------------------+
| 1730 | 22 |
| 1945 | 21 |
| 2000 | 19 |
+-------------------------+
3 rows
I also indexed each match by its full timestamp so we could find all the evening games this year if we wanted as well:
> Time.new(2013,1,1).to_i
=> 1356998400
START game=node:times('time:[1730 TO 2359] AND date: [1356998400 TO 9999999999]')
RETURN game.time, game.name, game.friendly_date
+------------------------------------------------------------------------------------+
| game.time | game.name | game.friendly_date |
+------------------------------------------------------------------------------------+
| 1730 | "Wigan Athletic vs Liverpool" | "2013-03-02 17:30:00 +0000" |
| 2000 | "Aston Villa vs Manchester City" | "2013-03-04 20:00:00 +0000" |
| 2000 | "West Ham United vs Tottenham Hotspur" | "2013-02-25 20:00:00 +0000" |
| 2000 | "Liverpool vs West Bromwich Albion" | "2013-02-11 20:00:00 +0000" |
| 1730 | "Fulham vs Manchester United" | "2013-02-02 17:30:00 +0000" |
...
| 1945 | "Chelsea vs Southampton" | "2013-01-16 19:45:00 +0000" |
+------------------------------------------------------------------------------------+
25 rows
About the author
I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.