neo4j/cypher: Returning a row with zero count when no relationship exists
I’ve been trying to see if I can match some of the football stats that OptaJoe posts on twitter and one that I was looking at yesterday was around the number of red cards different teams have received.
1 - Sunderland have picked up their first PL red card of the season. The only team without one now are Man Utd. Angels.
To refresh this is the sub graph that we’ll need to look at to work it out:
I started off with the following query which traverses out from each match, finds the players who were sent off in the match and then groups the sendings off by the team they were playing for:
START game = node:matches('match_id:*')
MATCH game<-[:sent_off_in]-player-[:played]->likeThis-[:in]->game,
likeThis-[:for]->team
RETURN team.name, COUNT(game) AS redCards
ORDER BY redCards
LIMIT 5
When we run this we get the following results:
+------------------------------+
| team.name | redCards |
+------------------------------+
| "Sunderland" | 1 |
| "West Ham United" | 1 |
| "Norwich City" | 1 |
| "Reading" | 1 |
| "Liverpool" | 2 |
+------------------------------+
5 rows
The problem we have here is that it hasn’t returned Manchester United because they haven’t yet received any red cards and therefore none of their players match the 'sent_off_in' relationship.
I ran into something similar in a post I wrote about a month ago where I was working out which day of the week players scored on.
The first step towards getting Manchester United to return with a count of 0 is to make the 'sent_off_in' relationship optional.
However, that on its own that isn’t enough because it now returns a count of all the player performances for each team:
START game = node:matches('match_id:*')
MATCH game<-[?:sent_off_in]-player-[:played]->likeThis-[:in]->game,
likeThis-[:for]->team
RETURN team.name, COUNT(game) AS redCards
ORDER BY redCards ASC
LIMIT 5
+-----------------------------+
| team.name | redCards |
+-----------------------------+
| "Chelsea" | 448 |
| "Wigan Athletic" | 459 |
| "Fulham" | 460 |
| "Liverpool" | 466 |
| "Everton" | 467 |
+-----------------------------+
5 rows
Instead what we need to do is collect up all the 'sent_off_in' relationships and sum them up.
We can use the COLLECT function to do that and the neat thing about COLLECT is that it doesn’t bother collecting the empty relationships so we end up with exactly what we need:
START game = node:matches('match_id:*')
MATCH game<-[r?:sent_off_in]-player-[:played]->likeThis-[:in]->game,
likeThis-[:for]->team
RETURN team.name, COLLECT(r) AS redCards
LIMIT 5
+-----------------------------------------------------------------------------------------------------+
| team.name | redCards |
+-----------------------------------------------------------------------------------------------------+
| "Wigan Athletic" | [:sent_off_in[26443] {},:sent_off_in[37785] {}] |
| "Everton" | [:sent_off_in[6795] {minute:61},:sent_off_in[21735] {},:sent_off_in[34594] {}] |
| "Newcastle United" | [:sent_off_in[434] {minute:75},:sent_off_in[32389] {},:sent_off_in[34915] {}] |
| "Southampton" | [:sent_off_in[49393] {minute:70},:sent_off_in[49392] {minute:82}] |
| "West Ham United" | [:sent_off_in[21734] {minute:67}] |
+-----------------------------------------------------------------------------------------------------+
5 rows
We then just need to call the LENGTH function to work out how many red cards there are in each collection and then we’re done:
START game = node:matches('match_id:*')
MATCH game<-[r?:sent_off_in]-player-[:played]->likeThis-[:in]->game,
likeThis-[:for]->team
RETURN team.name, LENGTH(COLLECT(r)) AS redCards
ORDER BY redCards
LIMIT 5
+--------------------------------+
| team.name | redCards |
+--------------------------------+
| "Manchester United" | 0 |
| "West Ham United" | 1 |
| "Sunderland" | 1 |
| "Norwich City" | 1 |
| "Reading" | 1 |
+--------------------------------+
5 rows
About the author
I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.