Neo4j: LOAD CSV - Handling conditionals
While building up the Neo4j World Cup Graph I’ve been making use of the http://neo4j.com/blog/neo4j-2-1-graph-etl function and I frequently found myself needing to do different things depending on the value in one of the columns.
For example I have one CSV file which contains the different events that can happen in a football match:
match_id,player,player_id,time,type
"1012","Antonin Panenka","174835",21,"penalty"
"1012","Faisal Al Dakhil","2204",57,"goal"
"102","Roger Milla","79318",106,"goal"
"102","Roger Milla","79318",108,"goal"
"102","Bernardo Redin","44555",115,"goal"
"102","Andre Kana-biyik","174649",44,"yellow"
If the type is 'penalty', 'owngoal' or 'goal' then I want to create a SCORED_GOAL relationship whereas if it’s 'yellow', 'yellowred' or 'red' then I want to create a RECEIVED_CARD relationship instead.
I learnt - from reading a cypher script written by Chris Leishman - that we can make FOREACH mimic a conditional by creating a collection with one item in to represent 'true' and an empty collection to represent 'false'.
In this case we’d end up with something like this to handle the case where a row represents a goal:
LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/mneedham/neo4j-worldcup/master/data/import/events.csv" AS csvLine
// removed for conciseness
// goals
FOREACH(n IN (CASE WHEN csvLine.type IN ["penalty", "goal", "owngoal"] THEN [1] else [] END) |
FOREACH(t IN CASE WHEN team = home THEN [home] ELSE [away] END |
MERGE (stats)-[:SCORED_GOAL]->(penalty:Goal {time: csvLine.time, type: csvLine.type})
)
)
And equally when we want to process a row that represents a card we’d have this:
// cards
FOREACH(n IN (CASE WHEN csvLine.type IN ["yellow", "red", "yellowred"] THEN [1] else [] END) |
FOREACH(t IN CASE WHEN team = home THEN [home] ELSE [away] END |
MERGE (stats)-[:RECEIVED_CARD]->(card {time: csvLine.time, type: csvLine.type})
)
)
And if we put everything together we get this:
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/mneedham/neo4j-worldcup/master/data/import/events.csv" AS csvLine
MATCH (home)<-[:HOME_TEAM]-(match:Match {id: csvLine.match_id})-[:AWAY_TEAM]->(away)
MATCH (player:Player {id: csvLine.player_id})-[:IN_SQUAD]->(squad)<-[:NAMED_SQUAD]-(team)
MATCH (player)-[:STARTED|:SUBSTITUTE]->(stats)-[:IN_MATCH]->(match)
// goals
FOREACH(n IN (CASE WHEN csvLine.type IN ["penalty", "goal", "owngoal"] THEN [1] else [] END) |
FOREACH(t IN CASE WHEN team = home THEN [home] ELSE [away] END |
MERGE (stats)-[:SCORED_GOAL]->(penalty:Goal {time: csvLine.time, type: csvLine.type})
)
)
// cards
FOREACH(n IN (CASE WHEN csvLine.type IN ["yellow", "red", "yellowred"] THEN [1] else [] END) |
FOREACH(t IN CASE WHEN team = home THEN [home] ELSE [away] END |
MERGE (stats)-[:RECEIVED_CARD]->(card {time: csvLine.time, type: csvLine.type})
)
)
;
You can have a look at the code on github or follow the instructions to get all the World Cup graph into your own local Neo4j.
Feedback welcome as always.
About the author
I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.