17 Mar 2015

Neo4j: Detecting potential typos using EXPLAIN

I’ve been running a few intro to Neo4j training sessions recently using Neo4j 2.2.0 RC1 and at some stage in every session somebody will make a typo when writing out of the example queries.

For example one of the queries that we do about half way finds the actors and directors who have worked together and aggregates the movies they were in.

This is the correct query:

MATCH (actor:Person)-[:ACTED_IN]->(movie)<-[:DIRECTED]-(director)
RETURN actor.name, director.name, COLLECT(movie.title) AS movies
ORDER BY LENGTH(movies) DESC
LIMIT 5

which should yield the following results:

==> +-----------------------------------------------------------------------------------------------------------------------+
==> | actor.name           | director.name    | movies                                                                      |
==> +-----------------------------------------------------------------------------------------------------------------------+
==> | "Hugo Weaving"       | "Andy Wachowski" | ["Cloud Atlas","The Matrix Revolutions","The Matrix Reloaded","The Matrix"] |
==> | "Hugo Weaving"       | "Lana Wachowski" | ["Cloud Atlas","The Matrix Revolutions","The Matrix Reloaded","The Matrix"] |
==> | "Laurence Fishburne" | "Lana Wachowski" | ["The Matrix Revolutions","The Matrix Reloaded","The Matrix"]               |
==> | "Keanu Reeves"       | "Lana Wachowski" | ["The Matrix Revolutions","The Matrix Reloaded","The Matrix"]               |
==> | "Carrie-Anne Moss"   | "Lana Wachowski" | ["The Matrix Revolutions","The Matrix Reloaded","The Matrix"]               |
==> +-----------------------------------------------------------------------------------------------------------------------+

However, a common typo is to write 'DIRECTED_IN' instead of 'DIRECTED' in which case we’ll see no results:

MATCH (actor:Person)-[:ACTED_IN]->(movie)<-[:DIRECTED_IN]-(director)
RETURN actor.name, director.name, COLLECT(movie.title) AS movies
ORDER BY LENGTH(movies) DESC
LIMIT 5

==> +-------------------------------------+
==> | actor.name | director.name | movies |
==> +-------------------------------------+
==> +-------------------------------------+
==> 0 row

It’s not immediately obvious why we aren’t seeing any results which can be quite frustrating.

However, in Neo4j 2.2 the 'EXPLAIN' keyword has been introduced and we can use this to see what the query planner thinks of the query we want to execute without actually executing it.

Instead the planner makes use of knowledge that it has about our schema to come up with a plan that it would run and how much of the graph it thinks that plan would touch: _~cypher EXPLAIN MATCH (actor:Person)-[:ACTED_IN]->(movie)<-[:DIRECTED_IN]-(director) RETURN actor.name, director.name, COLLECT(movie.title) AS movies ORDER BY LENGTH(movies) DESC LIMIT 5 _~

The first row of the query plan describes an all nodes scan which tells us that the query will start from the 'director' but it’s the second row that’s interesting.

The estimated rows when expanding the 'DIRECTED_IN' relationship is 0 when we’d expect it to at least be a positive value if there were some instances of that relationship in the database.

If we compare this to the plan generated when using the proper 'DIRECTED' relationship we can see the difference:

Here we see an estimated 44 rows from expanding the 'DIRECTED' relationship so we know there are at least some nodes connected by that relationship type.

In summary if you find your query not returning anything when you expect it to, prefix an 'EXPLAIN' and make sure you’re not seeing the dreaded '0 expected rows'.

About the author

I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.