Neo4j: Generic/Vague relationship names
An approach to modelling that I often see while working with Neo4j users is creating very generic relationships (e.g. HAS, CONTAINS, IS) and filtering on a relationship property or on a property/label at the end node.
Intuitively this doesn’t seem to make best use of the graph model as it means that you have to evaluate many relationships and nodes that you’re not interested in.
However, I’ve never actually tested the performance differences between the approaches so I thought I’d try it out.
I created 4 different databases which had one node with 60,000 outgoing relationships - 10,000 which we wanted to retrieve and 50,000 that were irrelevant.
I modelled the 'relationship' in 4 different ways...
-
Using a specific relationship type
(node)-[:HAS_ADDRESS]->(address) -
Using a generic relationship type and then filtering by end node label
(node)-[:HAS]->(address:Address) -
Using a generic relationship type and then filtering by relationship property
(node)-[:HAS {type: "address"}]->(address) -
Using a generic relationship type and then filtering by end node property
(node)-[:HAS]->(address {type: "address"})
...and then measured how long it took to retrieve the 'has address' relationships.
The code is on github if you want to take a look.
Although it’s obviously not as precise as a JMH micro benchmark I think it’s good enough to get a feel for the difference between the approaches.
I ran a query against each database 100 times and then took the 50th, 75th and 99th percentiles (times are in ms): ~text Using a generic relationship type and then filtering by end node label 50%ile: 6.0 75%ile: 6.0 99%ile: 402.60999999999825 Using a generic relationship type and then filtering by relationship property 50%ile: 21.0 75%ile: 22.0 99%ile: 504.85999999999785 Using a generic relationship type and then filtering by end node label 50%ile: 4.0 75%ile: 4.0 99%ile: 145.65999999999931 Using a specific relationship type 50%ile: 0.0 75%ile: 1.0 99%ile: 25.749999999999872 ~
We can drill further into why there’s a difference in the times for each of the approaches by profiling the equivalent cypher query. We’ll start with the one which uses a specific relationship name Using a specific relationship type ~cypher neo4j-sh (?)$ profile match (n) where id(n) = 0 match (n)-[:HAS_ADDRESS]->() return count(n); ---------- | count(n) | ---------- | 10000 | ---------- 1 row ColumnFilter | EagerAggregation | +SimplePatternMatcher | +NodeByIdOrEmpty +----------------------------------------------------------------------------------------- | Operator | Rows | DbHits | Identifiers | Other | ----------------------------------------------------------------------------------------- | ColumnFilter | 1 | 0 | | keep columns count(n) | | EagerAggregation | 1 | 0 | | | | SimplePatternMatcher | 10000 | 10000 | n, UNNAMED53, UNNAMED35 | | | NodeByIdOrEmpty | 1 | 1 | n, n | { AUTOINT0} | ----------------------------------------------------------------------------------------- Total database accesses: 10001 ~
Here we can see that there were 10,002 database accesses in order to get a count of our 10,000 HAS_ADDRESS relationships. We get a database access each time we load a node, relationship or property.
By contrast the other approaches have to load in a lot more data only to then filter it out: Using a generic relationship type and then filtering by end node label ~cypher neo4j-sh (?)$ profile match (n) where id(n) = 0 match (n)-[:HAS]->(:Address) return count(n); ---------- | count(n) | ---------- | 10000 | ---------- 1 row ColumnFilter | EagerAggregation | +Filter | +SimplePatternMatcher | +NodeByIdOrEmpty +---------------------------------------------------------------------------------------------------- | Operator | Rows | DbHits | Identifiers | Other | ---------------------------------------------------------------------------------------------------- | ColumnFilter | 1 | 0 | | keep columns count(n) | | EagerAggregation | 1 | 0 | | | | Filter | 10000 | 10000 | | hasLabel( UNNAMED45:Address(0)) | | SimplePatternMatcher | 10000 | 60000 | n, UNNAMED45, UNNAMED35 | | | NodeByIdOrEmpty | 1 | 1 | n, n | { AUTOINT0} | ---------------------------------------------------------------------------------------------------- Total database accesses: 70001 ~ Using a generic relationship type and then filtering by relationship property ~cypher neo4j-sh (?)$ profile match (n) where id(n) = 0 match (n)-[:HAS {type: "address"}]->() return count(n); ---------- | count(n) | ---------- | 10000 | ---------- 1 row ColumnFilter | EagerAggregation | +Filter | +SimplePatternMatcher | +NodeByIdOrEmpty +-------------------------------------------------------------------------------------------------------------------- | Operator | Rows | DbHits | Identifiers | Other | -------------------------------------------------------------------------------------------------------------------- | ColumnFilter | 1 | 0 | | keep columns count(n) | | EagerAggregation | 1 | 0 | | | | Filter | 10000 | 20000 | | Property( UNNAMED35,type(0)) == { AUTOSTRING1} | | SimplePatternMatcher | 10000 | 120000 | n, UNNAMED63, UNNAMED35 | | | NodeByIdOrEmpty | 1 | 1 | n, n | { AUTOINT0} | -------------------------------------------------------------------------------------------------------------------- Total database accesses: 140001 ~ Using a generic relationship type and then filtering by end node property ~cypher neo4j-sh (?)$ profile match (n) where id(n) = 0 match (n)-[:HAS]->({type: "address"}) return count(n); ---------- | count(n) | ---------- | 10000 | ---------- 1 row ColumnFilter | EagerAggregation | +Filter | +SimplePatternMatcher | +NodeByIdOrEmpty +-------------------------------------------------------------------------------------------------------------------- | Operator | Rows | DbHits | Identifiers | Other | -------------------------------------------------------------------------------------------------------------------- | ColumnFilter | 1 | 0 | | keep columns count(n) | | EagerAggregation | 1 | 0 | | | | Filter | 10000 | 20000 | | Property( UNNAMED45,type(0)) == { AUTOSTRING1} | | SimplePatternMatcher | 10000 | 120000 | n, UNNAMED45, UNNAMED35 | | | NodeByIdOrEmpty | 1 | 1 | n, n | { AUTOINT0} | -------------------------------------------------------------------------------------------------------------------- Total database accesses: 140001 ~
So in summary...specific relationships #ftw!
About the author
I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.