31 Oct 2018

Neo4j Graph Algorithms: Visualising Projected Graphs

A few weeks ago I wrote a blog post showing how to work out the best tennis player of all time using the Weighted PageRank algorithm, and in the process created a projected credibility graph which I want to explore in more detail in this post.

As I pointed out in that post, sometimes the graph model doesn’t fit well with what the algorithm expects, so we need to project the graph on which we run graph algorithms.

In this case, the PageRank algorithm works on top of a 'credibility graph' where nodes receive credibility from their incoming relationships. The amount of credibility a relationship gives is determined by the weight property on that relationship.

For our tennis graph we started with a graph of matches, winners, and losers, and then derived a credibility graph using the following query:

MATCH (p1)<-[:WINNER]-(match)-[:LOSER]->(p2)
RETURN id(p2) AS source, id(p1) AS target, count(*) as weight

We then used the example of Roger Federer playing Rafael Nadal to see how this worked in practice. If we feed their names into the query we’d the following results, starting with matches that Federer won:

MATCH (p1:Player {name: "Roger Federer"})<-[:WINNER]-(match)-[:LOSER]->(p2:Player {name: "Rafael Nadal"})
RETURN id(p2) AS source, id(p1) AS target, count(*) as weight

If we run that we’ll get this output:

╒════════╤════════╤════════╕
│"source"│"target"│"weight"│
╞════════╪════════╪════════╡
│7       │124     │15      │
└────────┴────────┴────────┘

And now for matches that Nadal won:

MATCH (p1:Player {name: "Rafael Nadal"})<-[:WINNER]-(match)-[:LOSER]->(p2:Player {name: "Roger Federer"})
RETURN id(p2) AS source, id(p1) AS target, count(*) as weight

And if we run that we’ll get this output:

╒════════╤════════╤════════╕
│"source"│"target"│"weight"│
╞════════╪════════╪════════╡
│124     │7       │23      │
└────────┴────────┴────────┘

It can be easier to understand how this works by visualising the projected graph. We can use a procedure from the APOC library to help us do this.

The following query shows how to do this:

MATCH (p1)<-[:WINNER]-(match)-[:LOSER]->(p2)
WITH p2, p1, count(*) AS count
CALL apoc.create.vRelationship(p2,"BEATEN_BY",{count:count},p1) yield rel
RETURN p2, p1, rel

The first two lines of the query are the same as before, but on the 3rd line we create a virtual relationship between the two player nodes. That returns a big graph, so let’s cheat a bit and just show the projected graph for some famous players.

First let’s create a parameter containing an array of players:

:params players => ["Roger Federer", "Andy Murray", "Novak Djokovic", "Rafael Nadal", "Alexander Zverev", "Pete Sampras", "Andre Agassi", "John McEnroe", "Yevgeny Kafelnikov"]

And now let’s visualise the matches between those players:

MATCH (p1)<-[:WINNER]-(match)-[:LOSER]->(p2)
WHERE p1.name in $players AND p2.name IN $players
WITH p2, p1, count(*) AS count
CALL apoc.create.vRelationship(p2,"BEATEN_BY",{count:count},p1) yield rel
RETURN p2, p1, rel

And if we run that query we’ll get back this graph:

About the author

I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.