neo4j/cypher: Converting queries from 1.9 to 2.0 - 'Can't use optional patterns without explicit START clause'
I’ve been playing around with the most recent Neo4j 2.0 milestone release - 2.0.0-M05 - and one of the first things I did was translate the queries from my football data set which were written against Neo4j 1.9.
The following query calculates the number of goals scored by players in matches that were shown on television, not on television and in total.
START player=node:players('name:*')
MATCH player-[:played|subbed_on]->stats-[:in]->game-[t?:on_tv]->channel
WITH COLLECT([stats.goals, TYPE(t)]) AS games, player
RETURN player.name,
REDUCE(goals = 0, g' IN FILTER(g IN games WHERE HEAD(TAIL(g)) IS NULL)| goals + HEAD(g')) AS nonTvGoals,
REDUCE(goals = 0, g' IN FILTER(g IN games WHERE HEAD(TAIL(g)) <> NULL)| goals + HEAD(g')) AS tvGoals,
REDUCE(goals = 0, g' in games | goals + HEAD(g')) AS totalGoals
ORDER BY tvGoals DESC
LIMIT 10
We use a legacy index to get all of the players, find the games they participated in, check if those games were on television and then group by player to work out whether the goals were scored in televised games.
When we evaluate that query we get the following result:
==> +--------------------------------------------------------+
==> | player.name | nonTvGoals | tvGoals | totalGoals |
==> +--------------------------------------------------------+
==> | "Robin Van Persie" | 11 | 15 | 26 |
==> | "Gareth Bale" | 8 | 13 | 21 |
==> | "Luis Suárez" | 12 | 11 | 23 |
==> | "Theo Walcott" | 5 | 9 | 14 |
==> | "Demba Ba" | 7 | 8 | 15 |
==> | "Edin Dzeko" | 7 | 7 | 14 |
==> | "Santi Cazorla" | 5 | 7 | 12 |
==> | "Juan Mata" | 6 | 6 | 12 |
==> | "Steven Gerrard" | 3 | 6 | 9 |
==> | "Carlos Tevez" | 5 | 6 | 11 |
==> +--------------------------------------------------------+
==> 10 rows
The first step was to get rid of the legacy index and replace it with a label based one. I created a label 'Player' for this purpose:
START player = node:players('name:*') SET player :Player RETURN player
CREATE INDEX on :Player(name)
Then I got rid of the legacy index lookup and replaced it with a label based one:
MATCH (player:Player)-[:played|subbed_on]->stats-[:in]->game-[t?:on_tv]->channel
WITH COLLECT([stats.goals, TYPE(t)]) AS games, player
RETURN player.name,
REDUCE(goals = 0, g' IN FILTER(g IN games WHERE HEAD(TAIL(g)) IS NULL)| goals + HEAD(g')) AS nonTvGoals,
REDUCE(goals = 0, g' IN FILTER(g IN games WHERE HEAD(TAIL(g)) <> NULL)| goals + HEAD(g')) AS tvGoals,
REDUCE(goals = 0, g' in games | goals + HEAD(g')) AS totalGoals
ORDER BY tvGoals DESC
LIMIT 10
Unfortunately you can’t use optional relationships like this:
PatternException: Can't use optional patterns without explicit START clause. Optional relationships: `t`
A neat workaround which Andres showed me is to first match all of the players and then make them available using a WITH statement:
MATCH (player:Player)
WITH player
MATCH player-[:played|subbed_on]->stats-[:in]->game-[t?:on_tv]->channel
WITH COLLECT([stats.goals, TYPE(t)]) AS games, player
RETURN player.name,
REDUCE(goals = 0, g' IN FILTER(g IN games WHERE HEAD(TAIL(g)) IS NULL)| goals + HEAD(g')) AS nonTvGoals,
REDUCE(goals = 0, g' IN FILTER(g IN games WHERE HEAD(TAIL(g)) <> NULL)| goals + HEAD(g')) AS tvGoals,
REDUCE(goals = 0, g' in games | goals + HEAD(g')) AS totalGoals
ORDER BY tvGoals DESC
LIMIT 10
If we evaluate this query it returns us the same results as before. However, given that we now have better collection support it seemed a shame to still use HEAD and TAIL so I replaced those with the following:
MATCH (player:Player)
WITH player
MATCH player-[:played|subbed_on]->stats-[:in]->game-[t?:on_tv]->channel
WITH COLLECT([stats.goals, TYPE(t)]) AS games, player
RETURN player.name,
REDUCE(goals = 0, g' IN FILTER(g IN games WHERE g[1] IS NULL)| goals + g'[0]) AS nonTvGoals,
REDUCE(goals = 0, g' IN FILTER(g IN games WHERE g[1] <> NULL)| goals + g'[0]) AS tvGoals,
REDUCE(goals = 0, g' in games | goals + g'[0]) AS totalGoals
ORDER BY tvGoals DESC
LIMIT 10
I didn’t like those array indexes either and since we can now create maps I thought the query would be clearer if we created one on line 4 of the query rather than an array:
MATCH (player:Player)
WITH player
MATCH player-[:played|subbed_on]->stats-[:in]->game-[t?:on_tv]->channel
WITH COLLECT({goals: stats.goals, ontv: TYPE(t)}) AS games, player
RETURN player.name,
REDUCE(goals = 0, g' IN FILTER(g IN games WHERE g.ontv IS NULL)| goals + g'.goals) AS nonTvGoals,
REDUCE(goals = 0, g' IN FILTER(g IN games WHERE g.ontv <> NULL)| goals + g'.goals) AS tvGoals,
REDUCE(goals = 0, g' in games | goals + g'.goals) AS totalGoals
ORDER BY tvGoals DESC
LIMIT 10
The next tweak would be to remove the FILTER and replace that with a list comprehension:
MATCH (player:Player)
WITH player
MATCH player-[:played|subbed_on]->stats-[:in]->game-[t?:on_tv]->channel
WITH COLLECT({goals: stats.goals, ontv: TYPE(t)}) AS games, player
RETURN player.name,
REDUCE(goals = 0, g' IN [g IN games WHERE g.ontv IS NULL] | goals + g'.goals) AS nonTvGoals,
REDUCE(goals = 0, g' IN [g IN games WHERE g.ontv <> NULL] | goals + g'.goals) AS tvGoals,
REDUCE(goals = 0, g' in games | goals + g'.goals) AS totalGoals
ORDER BY tvGoals DESC
LIMIT 10
Although the query ends up being a couple of lines longer than its 1.9 cousin I think there’s less noise and the intent is clearer.
About the author
I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.