Neo4j: Cypher - Remove consecutive duplicates from a list
I was playing with a dataset this week and wanted to share how I removes duplicate consecutive elements from a list using the Cypher query language.
For simplicity’s sake, imagine that we have this list:
neo4j> return [1,2,3,3,4,4,4,5,3] AS values;
+-----------------------------+
| values |
+-----------------------------+
| [1, 2, 3, 3, 4, 4, 4, 5, 3] |
+-----------------------------+
We want to remove the duplicate 3’s and 4’s, such that our end result should be:
[1,2,3,4,5,3]
APOC's apoc.coll.toSet
doesn’t quite do the trick because it removes duplicates regardless of where they appear in the collection:
neo4j> return apoc.coll.toSet([1,2,3,3,4,4,4,5,3]) AS values;
+-----------------+
| values |
+-----------------+
| [1, 2, 3, 4, 5] |
+-----------------+
Luckily it’s quite easy to translate Ulf Aslak’s Python one liner to do what we want. This is the Python version:
values = [1,2,3,3,4,4,4,5,3]
>>> [v for i, v in enumerate(values) if i == 0 or v != values[i-1]]
[1, 2, 3, 4, 5, 3]
We’ll use the range
function to iterate over our list and list comprehensions to do the rest.
The following code does the trick:
neo4j> WITH [1,2,3,3,4,4,4,5,3] AS values
RETURN [i in range(0, size(values)-1)
WHERE i=0 OR values[i] <> values[i-1] | values[i] ] AS values;
+--------------------+
| values |
+--------------------+
| [1, 2, 3, 4, 5, 3] |
+--------------------+
You can use this on collections containing nodes, strings, or anything else - I’ve just used numbers in the example to keep the example simple.
About the author
I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.