Neo4j: LOAD CSV - Column is null
One problem I’ve seen a few people have recently when using Neo4j’s LOAD CSV function is dealing with CSV files that have dodgy hidden characters at the beginning of the header line.
For example, consider an import of this CSV file:
$ cat ~/Downloads/dodgy.csv
userId,movieId
1,2
We might start by checking which columns it has:
$ load csv with headers from "file:/Users/markneedham/Downloads/dodgy.csv" as line return line;
+----------------------------------+
| line |
+----------------------------------+
| {userId -> "1", movieId -> "2"} |
+----------------------------------+
1 row
Looks good so far but what about if we try to return just 'userId'?
$ load csv with headers from "file:/Users/markneedham/Downloads/dodgy.csv" as line return line.userId;
+-------------+
| line.userId |
+-------------+
| <null> |
+-------------+
1 row
Hmmm it’s null...what about 'movieId'?
$ load csv with headers from "file:/Users/markneedham/Downloads/dodgy.csv" as line return line.movieId;
+--------------+
| line.movieId |
+--------------+
| "2" |
+--------------+
1 row
That works fine so immediately we can suspect there are hidden characters at the beginning of the first line of the file.
The easiest way to check if this is the case is open the file using a Hex Editor - I quite like Hex Fiend for the Mac.
If we look at dodgy.csv we’ll see the following:
Let’s delete the highlighted characters and try our cypher query again:
$ load csv with headers from "file:/Users/markneedham/Downloads/dodgy.csv" as line return line.userId;
+-------------+
| line.userId |
+-------------+
| "1" |
+-------------+
1 row
All is well again, but something to keep in mind if you see a LOAD CSV near you behaving badly.
About the author
I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.