12 Jan 2019

Neo4j: Cypher - Remove consecutive duplicates from a list

I was playing with a dataset this week and wanted to share how I removes duplicate consecutive elements from a list using the Cypher query language.

For simplicity’s sake, imagine that we have this list:

neo4j> return [1,2,3,3,4,4,4,5,3] AS values;
+-----------------------------+
| values                      |
+-----------------------------+
| [1, 2, 3, 3, 4, 4, 4, 5, 3] |
+-----------------------------+

We want to remove the duplicate 3’s and 4’s, such that our end result should be:

[1,2,3,4,5,3]

APOC's apoc.coll.toSet doesn’t quite do the trick because it removes duplicates regardless of where they appear in the collection:

neo4j> return apoc.coll.toSet([1,2,3,3,4,4,4,5,3]) AS values;
+-----------------+
| values          |
+-----------------+
| [1, 2, 3, 4, 5] |
+-----------------+

Luckily it’s quite easy to translate Ulf Aslak’s Python one liner to do what we want. This is the Python version:

values = [1,2,3,3,4,4,4,5,3]

>>> [v for i, v in enumerate(values) if i == 0 or v != values[i-1]]
[1, 2, 3, 4, 5, 3]

We’ll use the range function to iterate over our list and list comprehensions to do the rest. The following code does the trick:

neo4j> WITH [1,2,3,3,4,4,4,5,3] AS values
       RETURN [i in range(0, size(values)-1)
               WHERE i=0 OR values[i] <> values[i-1] | values[i] ] AS values;
+--------------------+
| values             |
+--------------------+
| [1, 2, 3, 4, 5, 3] |
+--------------------+

You can use this on collections containing nodes, strings, or anything else - I’ve just used numbers in the example to keep the example simple.

About the author

I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.