Neo4j 2.0.0-M06 \-> 2.0.0-RC1: Working with path expressions
We recently released Neo4j 2.0.0-RC1 and since there were some breaking changes from Neo4j 2.0.0-M06 I decided to check if I needed to update any of my football graph queries.
On query which no longer worked as I expected was the following one which calculated the top goal scorers for televised games:
MATCH (player:Player)-[:played|subbed_on]->stats
WITH stats.goals AS goals, player, stats-[:in]->()-[:on_tv]-() as onTv
RETURN player.name,
SUM(CASE WHEN onTv = FALSE THEN goals ELSE 0 END) as nonTvGoals,
SUM(CASE WHEN onTv = TRUE THEN goals ELSE 0 END) as tvGoals,
SUM(goals) as allGoals
ORDER BY tvGoals DESC
LIMIT 10
This is what that section of the graph looks like visually:
When I run that query in 2.0.0-M06 I get the following output:
==> +------------------------------------------------------+
==> | player.name | nonTvGoals | tvGoals | allGoals |
==> +------------------------------------------------------+
==> | "Robin Van Persie" | 11 | 15 | 26 |
==> | "Gareth Bale" | 8 | 13 | 21 |
==> | "Luis Suárez" | 12 | 11 | 23 |
==> | "Theo Walcott" | 5 | 9 | 14 |
==> | "Demba Ba" | 7 | 8 | 15 |
==> | "Santi Cazorla" | 5 | 7 | 12 |
==> | "Edin Dzeko" | 7 | 7 | 14 |
==> | "Carlos Tevez" | 5 | 6 | 11 |
==> | "Juan Mata" | 6 | 6 | 12 |
==> | "Steven Gerrard" | 3 | 6 | 9 |
==> +------------------------------------------------------+
However, when I tried it in 2.0.0-RC1 the 'nonTvGoals' and 'tvGoals' columns don’t show any goals!
==> +--------------------------------------------------------+
==> | player.name | nonTvGoals | tvGoals | allGoals |
==> +--------------------------------------------------------+
==> | "Djibril Cissé" | 0 | 0 | 3 |
==> | "Gabriel Agbonlahor" | 0 | 0 | 9 |
==> | "Shaun Derry" | 0 | 0 | 0 |
==> | "Marouane Fellaini" | 0 | 0 | 11 |
==> | "Jermaine Jenas" | 0 | 0 | 2 |
==> | "Claudio Yacob" | 0 | 0 | 0 |
==> | "Michael Owen" | 0 | 0 | 1 |
==> | "Sean Morrison" | 0 | 0 | 2 |
==> | "Kolo Touré" | 0 | 0 | 0 |
==> | "Antonio Valencia" | 0 | 0 | 1 |
==> +--------------------------------------------------------+
I simplified the query a bit and then ran it against both versions to try and work out what was going on:
MATCH (player:Player)-[:played|subbed_on]->stats
RETURN stats-[:in]->()-[:on_tv]-() as onTv
LIMIT 1
on 2.0.0-M06
==> +-------+
==> | onTv |
==> +-------+
==> | false |
==> +-------+
on 2.0.0-RC1
==> +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
==> | onTv |
==> +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
==> | [[Node[9058]{name:"Alou Diarra stats for 345824",node_id:"9058",match_id:"345824",type:"stats",goals:0},:in[47190]{},Node[9032]{name:"Swansea City vs West Ham United",node_id:"9032",friendly_date:"2012-08-25 12:45:00 +0100",time:1245,home_goals:3,match_id:"345824",away_goals:0,attendance:"20424",date:1345898700,type:"match"},:on_tv[47059]{},Node[143]{name:"Sky",node_id:"143"}]] |
==> +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
As you can see the path expression which indicates whether a match was televised now returns the matching path if there is one rather than a boolean value indicating if there is a path.
Now that we know that we can tweak the query to check if the onTV path contains any relationships which would indicate that a path does exist:
MATCH (player:Player)-[:played|subbed_on]->stats
WITH stats.goals AS goals, player, stats-[:in]->()-[:on_tv]-() as onTv
RETURN player.name,
SUM(CASE WHEN LENGTH(onTv) = 0 THEN goals ELSE 0 END) as nonTVGoals,
SUM(CASE WHEN LENGTH(onTv) > 0 THEN goals ELSE 0 END) as tvGoals,
SUM(goals) AS allGoals
ORDER BY tvGoals DESC
LIMIT 10
If we run that it works as before:
==> +------------------------------------------------------+
==> | player.name | nonTVGoals | tvGoals | allGoals |
==> +------------------------------------------------------+
==> | "Robin Van Persie" | 11 | 15 | 26 |
==> | "Gareth Bale" | 8 | 13 | 21 |
==> | "Luis Suárez" | 12 | 11 | 23 |
==> | "Theo Walcott" | 5 | 9 | 14 |
==> | "Demba Ba" | 7 | 8 | 15 |
==> | "Santi Cazorla" | 5 | 7 | 12 |
==> | "Edin Dzeko" | 7 | 7 | 14 |
==> | "Carlos Tevez" | 5 | 6 | 11 |
==> | "Juan Mata" | 6 | 6 | 12 |
==> | "Steven Gerrard" | 3 | 6 | 9 |
==> +------------------------------------------------------+
I’m still going through the rest of my queries to check they work but if any of your pre 2.0.0-RC1 queries are breaking do post them up on the mailing list and someone will help you sort them out.
About the author
I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.