Neo4j: Set Based Operations with the experimental Cypher optimiser
A few months ago I wrote about cypher queries which look for a missing relationship and showed how you could optimise them by re-working the query slightly.
To refresh, we wanted to find all the people in the London office that I hadn’t worked with given this model...
</img>
...and this initial query:
MATCH (p:Person {name: "me"})-[:MEMBER_OF]->(office {name: "London Office"})<-[:MEMBER_OF]-(colleague)
WHERE NOT (p-[:COLLEAGUES]->(colleague))
RETURN COUNT(colleague)
This took on average 7.46 seconds to execute using cypher-query-tuning so we came up with the following version which took 150 ms on average:
MATCH (p:Person {name: "me"})-[:COLLEAGUES]->(colleague)
WITH p, COLLECT(colleague) as marksColleagues
MATCH (colleague)-[:MEMBER_OF]->(office {name: "London Office"})<-[:MEMBER_OF]-(p)
WHERE NOT (colleague IN marksColleagues)
RETURN COUNT(colleague)
With the release of Neo4j 2.1 we can now make use of Ronja - the experimental Cypher optimiser - which performs much better for certain types of queries. I thought I’d give it a try against this one.
We can use the experimental optimiser by prefixing our query like so:
cypher 2.1.experimental MATCH (p:Person {name: "me"})-[:MEMBER_OF]->(office {name: "London Office"})<-[:MEMBER_OF]-(colleague)
WHERE NOT (p-[:COLLEAGUES]->(colleague))
RETURN COUNT(colleague)
If we run that through the query tuner we get the following results:
$ python set-based.py
cypher 2.1.experimental MATCH (p:Person {name: "me"})-[:MEMBER_OF]->(office {name: "London Office"})<-[:MEMBER_OF]-(colleague)
WHERE NOT (p-[:COLLEAGUES]->(colleague))
RETURN COUNT(colleague)
Min 0.719580888748 50% 0.723278999329 95% 0.741609430313 Max 0.743646144867
MATCH (p:Person {name: "me"})-[:COLLEAGUES]->(colleague)
WITH p, COLLECT(colleague) as marksColleagues
MATCH (colleague)-[:MEMBER_OF]->(office {name: "London Office"})<-[:MEMBER_OF]-(p)
WHERE NOT (colleague IN marksColleagues)
RETURN COUNT(colleague)
Min 0.706955909729 50% 0.715770959854 95% 0.731880950928 Max 0.733670949936
As you can see there’s not much in it - our original query now runs as quickly as the optimised one. Ronja #ftw!
Give it a try on your slow queries and see how it gets on. There’ll certainly be some cases where it’s slower but over time it should be faster for a reasonable chunk of queries.
About the author
I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.