· neo4j cypher

Neo4j: LOAD CSV - Handling conditionals

While building up the Neo4j World Cup Graph I’ve been making use of the http://neo4j.com/blog/neo4j-2-1-graph-etl function and I frequently found myself needing to do different things depending on the value in one of the columns.

For example I have one CSV file which contains the different events that can happen in a football match:

match_id,player,player_id,time,type
"1012","Antonin Panenka","174835",21,"penalty"
"1012","Faisal Al Dakhil","2204",57,"goal"
"102","Roger Milla","79318",106,"goal"
"102","Roger Milla","79318",108,"goal"
"102","Bernardo Redin","44555",115,"goal"
"102","Andre Kana-biyik","174649",44,"yellow"

If the type is 'penalty', 'owngoal' or 'goal' then I want to create a SCORED_GOAL relationship whereas if it’s 'yellow', 'yellowred' or 'red' then I want to create a RECEIVED_CARD relationship instead.

I learnt - from reading a cypher script written by Chris Leishman - that we can make FOREACH mimic a conditional by creating a collection with one item in to represent 'true' and an empty collection to represent 'false'.

In this case we’d end up with something like this to handle the case where a row represents a goal:

LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/mneedham/neo4j-worldcup/master/data/import/events.csv" AS csvLine

// removed for conciseness

// goals
FOREACH(n IN (CASE WHEN csvLine.type IN ["penalty", "goal", "owngoal"] THEN [1] else [] END) |
  FOREACH(t IN CASE WHEN team = home THEN [home] ELSE [away] END |
    MERGE (stats)-[:SCORED_GOAL]->(penalty:Goal {time: csvLine.time, type: csvLine.type})
  )
)

And equally when we want to process a row that represents a card we’d have this:

// cards
FOREACH(n IN (CASE WHEN csvLine.type IN ["yellow", "red", "yellowred"] THEN [1] else [] END) |
  FOREACH(t IN CASE WHEN team = home THEN [home] ELSE [away] END |
    MERGE (stats)-[:RECEIVED_CARD]->(card {time: csvLine.time, type: csvLine.type})
  )
)

And if we put everything together we get this:

USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/mneedham/neo4j-worldcup/master/data/import/events.csv" AS csvLine

MATCH (home)<-[:HOME_TEAM]-(match:Match {id: csvLine.match_id})-[:AWAY_TEAM]->(away)

MATCH (player:Player {id: csvLine.player_id})-[:IN_SQUAD]->(squad)<-[:NAMED_SQUAD]-(team)
MATCH (player)-[:STARTED|:SUBSTITUTE]->(stats)-[:IN_MATCH]->(match)

// goals
FOREACH(n IN (CASE WHEN csvLine.type IN ["penalty", "goal", "owngoal"] THEN [1] else [] END) |
  FOREACH(t IN CASE WHEN team = home THEN [home] ELSE [away] END |
    MERGE (stats)-[:SCORED_GOAL]->(penalty:Goal {time: csvLine.time, type: csvLine.type})
  )
)

// cards
FOREACH(n IN (CASE WHEN csvLine.type IN ["yellow", "red", "yellowred"] THEN [1] else [] END) |
  FOREACH(t IN CASE WHEN team = home THEN [home] ELSE [away] END |
    MERGE (stats)-[:RECEIVED_CARD]->(card {time: csvLine.time, type: csvLine.type})
  )
)
;

Feedback welcome as always.

  • LinkedIn
  • Tumblr
  • Reddit
  • Google+
  • Pinterest
  • Pocket