Neo4j 2.1.6 - Cypher: FOREACH slowness
A common problem that people have when using Neo4j for social network applications is updating a person with their newly imported friends.
We’ll have an array of friends that we want to connect to a single Person node. Assuming the following schema…
$ schema
Indexes
ON :Person(id) ONLINE
No constraints
…a simplified version would look like this:
WITH range (2,1002) AS friends
MERGE (p:Person {id: 1})
FOREACH(f IN friends |
MERGE (friend:Person {id: f})
MERGE (friend)-[:FRIENDS]->p);
If we execute that on an empty database we’ll see something like this:
+-------------------+
| No data returned. |
+-------------------+
Nodes created: 1002
Relationships created: 1001
Properties set: 1002
Labels added: 1002
19173 ms
This took much longer than we’d expect so let’s have a look at the PROFILE output:
EmptyResult
|
+UpdateGraph(0)
|
+Eager
|
+UpdateGraph(1)
|
+Extract
|
+Null
+----------------+------+---------+-------------+--------------------------------------+
| Operator | Rows | DbHits | Identifiers | Other |
+----------------+------+---------+-------------+--------------------------------------+
| EmptyResult | 0 | 0 | | |
| UpdateGraph(0) | 1 | 3015012 | | Foreach |
| Eager | 1 | 0 | | |
| UpdateGraph(1) | 1 | 5 | p, p | MergeNode; { AUTOINT2}; :Person(id) |
| Extract | 1 | 0 | | friends |
| Null | ? | ? | | |
+----------------+------+---------+-------------+--------------------------------------+
The <cite>DbHits</cite> value on the 2nd row seems suspiciously high suggesting that FOREACH might not be making use of the <cite>Person#id</cite> index and is instead scanning all <cite>Person</cite> nodes each time.
I’m not sure how to drill into that further but an alternative approach is to try out the same query but using UNWIND instead and checking the profile output of that:
WITH range (2,1002) AS friends
MERGE (p:Person {id: 1})
WITH p, friends
UNWIND friends AS f
MERGE (friend:Person {id: f})
MERGE (friend)-[:FRIENDS]->p;
----
+-------------------+
| No data returned. |
+-------------------+
Nodes created: 1002
Relationships created: 1001
Properties set: 1002
Labels added: 1002
343 ms
EmptyResult
|
+UpdateGraph(0)
|
+Eager(0)
|
+UpdateGraph(1)
|
+UNWIND
|
+Eager(1)
|
+UpdateGraph(2)
|
+Extract
|
+Null
+----------------+------+--------+-------------------------+--------------------------------------+
| Operator | Rows | DbHits | Identifiers | Other |
+----------------+------+--------+-------------------------+--------------------------------------+
| EmptyResult | 0 | 0 | | |
| UpdateGraph(0) | 1001 | 0 | friend, p, UNNAMED136 | MergePattern |
| Eager(0) | 1001 | 0 | | |
| UpdateGraph(1) | 1001 | 5005 | friend, friend | MergeNode; f; :Person(id) |
| UNWIND | 1001 | 0 | | |
| Eager(1) | 1 | 0 | | |
| UpdateGraph(2) | 1 | 5 | p, p | MergeNode; { AUTOINT2}; :Person(id) |
| Extract | 1 | 0 | | friends |
| Null | ? | ? | | |
+----------------+------+--------+-------------------------+--------------------------------------+
That’s much quicker and doesn’t touch as many nodes as FOREACH was. I expect the index issue will be sorted out in future but until then UNWIND is our friend.
About the author
I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.