Neo4j: Cypher - Removing consecutive duplicates
When writing Cypher queries I sometimes find myself wanting to remove consecutive duplicates in collections that I’ve joined together.
e.g we might start with the following query where 1 and 7 appear consecutively:
RETURN [1,1,2,3,4,5,6,7,7,8] AS values
==> +-----------------------+
==> | values |
==> +-----------------------+
==> | [1,1,2,3,4,5,6,7,7,8] |
==> +-----------------------+
==> 1 row
We want to end up with [1,2,3,4,5,6,7,8]. We can start by exploding our array and putting consecutive elements next to each other:
WITH [1,1,2,3,4,5,6,7,7,8] AS values
UNWIND RANGE(0, LENGTH(values) - 2) AS idx
RETURN idx, idx+1, values[idx], values[idx+1]
==> +-------------------------------------------+
==> | idx | idx+1 | values[idx] | values[idx+1] |
==> +-------------------------------------------+
==> | 0 | 1 | 1 | 1 |
==> | 1 | 2 | 1 | 2 |
==> | 2 | 3 | 2 | 3 |
==> | 3 | 4 | 3 | 4 |
==> | 4 | 5 | 4 | 5 |
==> | 5 | 6 | 5 | 6 |
==> | 6 | 7 | 6 | 7 |
==> | 7 | 8 | 7 | 7 |
==> | 8 | 9 | 7 | 8 |
==> +-------------------------------------------+
==> 9 rows
Next we can filter out rows which have the same values since that means they have consecutive duplicates:
WITH [1,1,2,3,4,5,6,7,7,8] AS values
UNWIND RANGE(0, LENGTH(values) - 2) AS idx
WITH values[idx] AS a, values[idx+1] AS b
WHERE a <> b
RETURN a,b
==> +-------+
==> | a | b |
==> +-------+
==> | 1 | 2 |
==> | 2 | 3 |
==> | 3 | 4 |
==> | 4 | 5 |
==> | 5 | 6 |
==> | 6 | 7 |
==> | 7 | 8 |
==> +-------+
==> 7 rows
Now we need to join the collection back together again. Most of the values we want are in field 'b' but we also need to grab the first value from field 'a':
WITH [1,1,2,3,4,5,6,7,7,8] AS values
UNWIND RANGE(0, LENGTH(values) - 2) AS idx
WITH values[idx] AS a, values[idx+1] AS b
WHERE a <> b
RETURN COLLECT(a)[0] + COLLECT(b) AS noDuplicates
==> +-------------------+
==> | noDuplicates |
==> +-------------------+
==> | [1,2,3,4,5,6,7,8] |
==> +-------------------+
==> 1 row
What about if we have more than 2 duplicates in a row?
WITH [1,1,1,2,3,4,5,5,6,7,7,8] AS values
UNWIND RANGE(0, LENGTH(values) - 2) AS idx
WITH values[idx] AS a, values[idx+1] AS b
WHERE a <> b
RETURN COLLECT(a)[0] + COLLECT(b) AS noDuplicates
==> +-------------------+
==> | noDuplicates |
==> +-------------------+
==> | [1,2,3,4,5,6,7,8] |
==> +-------------------+
==> 1 row
Still happy, good times! Of course if we have a non consecutive duplicate that wouldn’t be removed:
WITH [1,1,1,2,3,4,5,5,6,7,7,8,1] AS values
UNWIND RANGE(0, LENGTH(values) - 2) AS idx
WITH values[idx] AS a, values[idx+1] AS b
WHERE a <> b
RETURN COLLECT(a)[0] + COLLECT(b) AS noDuplicates
==> +---------------------+
==> | noDuplicates |
==> +---------------------+
==> | [1,2,3,4,5,6,7,8,1] |
==> +---------------------+
==> 1 row
About the author
I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.