Neo4j: COLLECTing multiple values (Too many parameters for function 'collect')
One of my favourite functions in Neo4j’s cypher query language is COLLECT which allows us to group items into an array for later consumption.
However, I’ve noticed that people sometimes have trouble working out how to collect multiple items with COLLECT and struggle to find a way to do so.
Consider the following data set:
create (p:Person {name: "Mark"})
create (e1:Event {name: "Event1", timestamp: 1234})
create (e2:Event {name: "Event2", timestamp: 4567})
create (p)-[:EVENT]->(e1)
create (p)-[:EVENT]->(e2)
If we wanted to return each person along with a collection of the event names they’d participated in we could write the following:
$ MATCH (p:Person)-[:EVENT]->(e)
> RETURN p, COLLECT(e.name);
+--------------------------------------------+
| p | COLLECT(e.name) |
+--------------------------------------------+
| Node[0]{name:"Mark"} | ["Event1","Event2"] |
+--------------------------------------------+
1 row
That works nicely, but what about if we want to collect the event name and the timestamp but don’t want to return the entire event node?
An approach I’ve seen a few people try during workshops is the following:
MATCH (p:Person)-[:EVENT]->(e)
RETURN p, COLLECT(e.name, e.timestamp)
Unfortunately this doesn’t compile:
SyntaxException: Too many parameters for function 'collect' (line 2, column 11)
"RETURN p, COLLECT(e.name, e.timestamp)"
^
As the error message suggests, the COLLECT function only takes one argument so we need to find another way to solve our problem.
One way is to put the two values into a literal array which will result in an array of arrays as our return result:
$ MATCH (p:Person)-[:EVENT]->(e)
> RETURN p, COLLECT([e.name, e.timestamp]);
+----------------------------------------------------------+
| p | COLLECT([e.name, e.timestamp]) |
+----------------------------------------------------------+
| Node[0]{name:"Mark"} | [["Event1",1234],["Event2",4567]] |
+----------------------------------------------------------+
1 row
The annoying thing about this approach is that as you add more items you’ll forget in which position you’ve put each bit of data so I think a preferable approach is to collect a map of items instead:
$ MATCH (p:Person)-[:EVENT]->(e)
> RETURN p, COLLECT({eventName: e.name, eventTimestamp: e.timestamp});
+--------------------------------------------------------------------------------------------------------------------------+
| p | COLLECT({eventName: e.name, eventTimestamp: e.timestamp}) |
+--------------------------------------------------------------------------------------------------------------------------+
| Node[0]{name:"Mark"} | [{eventName -> "Event1", eventTimestamp -> 1234},{eventName -> "Event2", eventTimestamp -> 4567}]
| +--------------------------------------------------------------------------------------------------------------------------+
1 row
During the Clojure Neo4j Hackathon that we ran earlier this week this proved to be a particularly pleasing approach as we could easily destructure the collection of maps in our Clojure code.
About the author
I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.