Neo4j: Loading JSON documents with Cypher
One of the most commonly asked questions I get asked is how to load JSON documents into Neo4j and although Cypher doesn’t have a 'LOAD JSON' command we can still get JSON data into the graph.
Michael shows how to do this from various languages in this blog post and I recently wanted to load a JSON document that I generated from Chicago crime types.
This is a snippet of the JSON document:
{
"categories": [
{
"name": "Index Crime",
"sub_categories": [
{
"code": "01A",
"description": "Homicide 1st & 2nd Degree"
}
]
},
{
"name": "Non-Index Crime",
"sub_categories": [
{
"code": "01B",
"description": "Involuntary Manslaughter"
}
]
},
{
"name": "Violent Crime",
"sub_categories": [
{
"code": "01A",
"description": "Homicide 1st & 2nd Degree"
}
]
}
]
}
We want to create the following graph structure from this document:
We can then connect the crimes to the appropriate sub category and write aggregation queries that drill down from the category.
To do this we’re going to have to pass the JSON document to Neo4j via its HTTP API rather than through the browser. Luckily there are drivers available for {insert your favourite language here} so we should still be good.
Python is my current goto language so I’m going to use py2neo to load the data in.
Let’s start by writing a simple query which passes our JSON document in and gets it straight back. Note that I’ve updated my Neo4j password to be 'foobar' - replace that with your equivalent if you’re following along:
import json
from py2neo import Graph, authenticate
# replace 'foobar' with your password
authenticate("localhost:7474", "neo4j", "foobar")
graph = Graph()
with open('categories.json') as data_file:
json = json.load(data_file)
query = """
RETURN {json}
"""
# Send Cypher query.
print graph.cypher.execute(query, json = json)
$ python import_categories.py
| document
---+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 | {u'categories': [{u'name': u'Index Crime', u'sub_categories': [{u'code': u'01A', u'description': u'Homicide 1st & 2nd Degree'}, {u'code': u'02', u'description': u'Criminal Sexual Assault'}, {u'code': u'03', u'description': u'Robbery'}, {u'code': u'04A', u'description': u'Aggravated Assault'}, {u'code': u'04B', u'description': u'Aggravated Battery'}, {u'code': u'05', u'description': u'Burglary'}, {u'code': u'06', u'description': u'Larceny'}, {u'code': u'07', u'description': u'Motor Vehicle Theft'}, {u'code': u'09', u'description': u'Arson'}]}, {u'name': u'Non-Index Crime', u'sub_categories': [{u'code': u'01B', u'description': u'Involuntary Manslaughter'}, {u'code': u'08A', u'description': u'Simple Assault'}, {u'code': u'08B', u'description': u'Simple Battery'}, {u'code': u'10', u'description': u'Forgery & Counterfeiting'}, {u'code': u'11', u'description': u'Fraud'}, {u'code': u'12', u'description': u'Embezzlement'}, {u'code': u'13', u'description': u'Stolen Property'}, {u'code': u'14', u'description': u'Vandalism'}, {u'code': u'15', u'description': u'Weapons Violation'}, {u'code': u'16', u'description': u'Prostitution'}, {u'code': u'17', u'description': u'Criminal Sexual Abuse'}, {u'code': u'18', u'description': u'Drug Abuse'}, {u'code': u'19', u'description': u'Gambling'}, {u'code': u'20', u'description': u'Offenses Against Family'}, {u'code': u'22', u'description': u'Liquor License'}, {u'code': u'24', u'description': u'Disorderly Conduct'}, {u'code': u'26', u'description': u'Misc Non-Index Offense'}]}, {u'name': u'Violent Crime', u'sub_categories': [{u'code': u'01A', u'description': u'Homicide 1st & 2nd Degree'}, {u'code': u'02', u'description': u'Criminal Sexual Assault'}, {u'code': u'03', u'description': u'Robbery'}, {u'code': u'04A', u'description': u'Aggravated Assault'}, {u'code': u'04B', u'description': u'Aggravated Battery'}]}]}
It’s a bit ugly but we can see that everything’s there! Our next step is to extract each category into its own row. We can do this by accessing the 'categories' key in our JSON document and then calling the http://neo4j.com/docs/stable/query-unwind.html function which allows us to expand a collection into a sequence of rows:
query = """
WITH {json} AS document
UNWIND document.categories AS category
RETURN category.name
"""
$ python import_categories.py
| category.name
---+-----------------
1 | Index Crime
2 | Non-Index Crime
3 | Violent Crime
Now we can create a node for each of those categories. We’ll use the MERGE command so that we can run this script multiple times without ending up with repeat categories:
query = """
WITH {json} AS document
UNWIND document.categories AS category
MERGE (:CrimeCategory {name: category.name})
"""
Let’s quickly check those categories were correctly imported:
match (category:CrimeCategory)
return category
Looking good so far - now for the sub categories. We’re going to use the UNWIND function to help us out here as well:
query = """
WITH {json} AS document
UNWIND document.categories AS category
UNWIND category.sub_categories AS subCategory
RETURN category.name, subCategory.code, subCategory.description
"""
$ python import_categories.py
| category.name | subCategory.code | subCategory.description
----+-----------------+------------------+---------------------------
1 | Index Crime | 01A | Homicide 1st & 2nd Degree
2 | Index Crime | 02 | Criminal Sexual Assault
3 | Index Crime | 03 | Robbery
4 | Index Crime | 04A | Aggravated Assault
5 | Index Crime | 04B | Aggravated Battery
6 | Index Crime | 05 | Burglary
7 | Index Crime | 06 | Larceny
8 | Index Crime | 07 | Motor Vehicle Theft
9 | Index Crime | 09 | Arson
10 | Non-Index Crime | 01B | Involuntary Manslaughter
11 | Non-Index Crime | 08A | Simple Assault
12 | Non-Index Crime | 08B | Simple Battery
13 | Non-Index Crime | 10 | Forgery & Counterfeiting
14 | Non-Index Crime | 11 | Fraud
15 | Non-Index Crime | 12 | Embezzlement
16 | Non-Index Crime | 13 | Stolen Property
17 | Non-Index Crime | 14 | Vandalism
18 | Non-Index Crime | 15 | Weapons Violation
19 | Non-Index Crime | 16 | Prostitution
20 | Non-Index Crime | 17 | Criminal Sexual Abuse
21 | Non-Index Crime | 18 | Drug Abuse
22 | Non-Index Crime | 19 | Gambling
23 | Non-Index Crime | 20 | Offenses Against Family
24 | Non-Index Crime | 22 | Liquor License
25 | Non-Index Crime | 24 | Disorderly Conduct
26 | Non-Index Crime | 26 | Misc Non-Index Offense
27 | Violent Crime | 01A | Homicide 1st & 2nd Degree
28 | Violent Crime | 02 | Criminal Sexual Assault
29 | Violent Crime | 03 | Robbery
30 | Violent Crime | 04A | Aggravated Assault
31 | Violent Crime | 04B | Aggravated Battery
Let’s give sub categories the MERGE treatment too:
query = """
WITH {json} AS document
UNWIND document.categories AS category
UNWIND category.sub_categories AS subCategory
MERGE (c:CrimeCategory {name: category.name})
MERGE (sc:SubCategory {code: subCategory.code})
ON CREATE SET sc.description = subCategory.description
MERGE (c)-[:CHILD]->(sc)
"""
And finally let’s write a query to check what we’ve imported:
match (category:CrimeCategory)-[:CHILD]->(subCategory)
return *
I hadn’t realised before running this query is that some sub categories sit under multiple categories so that’s quite an interesting insight. The final Python script is available on github - any questions let me know.
About the author
I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.