Neo4j: Cypher - Flatten a collection
Every now and then in Cypher land we’ll end up with a collection of arrays, often created via the COLLECT function, that we want to squash down into one array.
For example let’s say we have the following array of arrays...
$ RETURN [[1,2,3], [4,5,6], [7,8,9]] AS result;
==> +---------------------------+
==> | result |
==> +---------------------------+
==> | [[1,2,3],[4,5,6],[7,8,9]] |
==> +---------------------------+
==> 1 row
...and we want to return the array .
Many programming languages have a 'flatten' function and although cypher doesn’t we can make our own by using the http://docs.neo4j.org/chunked/stable/query-functions-collection.html#functions-reduce function:
$ WITH [[1,2,3], [4,5,6], [7,8,9]] AS result
RETURN REDUCE(output = [], r IN result | output + r) AS flat;
==> +---------------------+
==> | flat |
==> +---------------------+
==> | [1,2,3,4,5,6,7,8,9] |
==> +---------------------+
==> 1 row
Here we’re passing the array 'output' over the collection and adding the individual arrays (, and ) to that array as we iterate over the collection.
If we’re working with numbers in Neo4j 2.0.1 we’ll get this type exception with this version of the code:
==> SyntaxException: Type mismatch: expected Any, Collection<Any> or Collection<Collection<Any>> but was Integer (line 1, column 148)
We can easily work around that by coercing the type of 'output' like so:
WITH [[1,2,3], [4,5,6], [7,8,9]] AS result
RETURN REDUCE(output = range(0,-1), r IN result | output + r);
Of course this is quite a simple example but we can handle more complicated scenarios as well by using nested calls to REDUCE. For example let’s say we wanted to completely flatten this array:
$ RETURN [[1,2,3], [4], [5, [6, 7]], [8,9]] AS result;
==> +-------------------------------+
==> | result |
==> +-------------------------------+
==> | [[1,2,3],[4],[5,[6,7]],[8,9]] |
==> +-------------------------------+
==> 1 row
We could write the following cypher code:
$ WITH [[1,2,3], [4], [5, [6, 7]], [8,9]] AS result
RETURN REDUCE(output = [], r IN result | output + REDUCE(innerOutput = [], innerR in r | innerOutput + innerR)) AS flat;
==> +---------------------+
==> | flat |
==> +---------------------+
==> | [1,2,3,4,5,6,7,8,9] |
==> +---------------------+
==> 1 row
Here we have an outer REDUCE function which iterates over , , [5, [6,7]] and and then an inner REDUCE function which iterates over those individual arrays.
If we had more nesting then we could just introduce another level of nesting!
About the author
I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.