neo4j/cypher: Keep longest path when finding taxonomy
I’ve been playing around with modelling a product taxonomy and one thing that I wanted to do was find out the full path where a product sits under the tree.
I created a simple data set to show the problem:
CREATE (cat { name: "Cat" })
CREATE (subcat1 { name: "SubCat1" })
CREATE (subcat2 { name: "SubCat2" })
CREATE (subsubcat1 { name: "SubSubCat1" })
CREATE (product1 { name: "Product1" })
CREATE (cat)-[:CHILD]-subcat1-[:CHILD]-subsubcat1
CREATE (product1)-[:HAS_CATEGORY]-(subsubcat1)
I wanted to write a query which would return 'product1' and the tree 'Cat -> SubCat1 -> SubSubCat1' and initially wrote the following query:
START product=node:node_auto_index(name="Product1")
MATCH product-[:HAS_CATEGORY]-category, taxonomy=category<-[:CHILD*1..]-parent
RETURN product, EXTRACT(n IN NODES(taxonomy): n.name)
which returns:
==> +--------------------------------------------------------------------+
==> | product | EXTRACT(n IN NODES(taxonomy): n.name) |
==> +--------------------------------------------------------------------+
==> | Node[888]{name:"Product1"} | ["SubSubCat1","SubCat1"] |
==> | Node[888]{name:"Product1"} | ["SubSubCat1","SubCat1","Cat"] |
==> +--------------------------------------------------------------------+
==> 2 rows
I didn’t want to return the first row since that isn’t the full tree and Andres suggested that looking for nodes which didn’t have any incoming children would help me do that:
START product=node:node_auto_index(name="Product1")
MATCH product-[:HAS_CATEGORY]-category,
taxonomy=category<-[:CHILD*1..]-parent
WHERE NOT parent<-[:CHILD]-()
RETURN product, EXTRACT(n IN NODES(taxonomy): n.name)
==> +--------------------------------------------------------------------+
==> | product | EXTRACT(n IN NODES(taxonomy): n.name) |
==> +--------------------------------------------------------------------+
==> | Node[888]{name:"Product1"} | ["SubSubCat1","SubCat1","Cat"] |
==> +--------------------------------------------------------------------+
==> 1 row
If we want to reverse the taxonomy so it’s in the right order we can follow Wes Freeman’s advice from the following Stack Overflow thread:
START product=node:node_auto_index(name="Product1")
MATCH product-[:HAS_CATEGORY]-category, taxonomy=category<-[:CHILD*1..]-parent
WHERE NOT parent<-[:CHILD]-()
RETURN product,
REDUCE(acc=[], cat IN EXTRACT(n IN NODES(taxonomy): n.name): cat + acc) AS taxonomy
==> +-------------------------------------------------------------+
==> | product | taxonomy |
==> +-------------------------------------------------------------+
==> | Node[888]{name:"Product1"} | ["Cat","SubCat1","SubSubCat1"] |
==> +-------------------------------------------------------------+
==> 1 row
About the author
I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.