neo4j/cypher: Finding football stadiums near a city using spatial
One of the things that I wanted to add to my football graph was something location related so I could try out neo4j spatial and I thought the easiest way to do that was to model the location of football stadiums.
To start with I needed to add spatial as an unmanaged extension to my neo4j plugins folder which involved doing the following:
$ git clone git://github.com/neo4j/spatial.git spatial
$ cd spatial
$ mvn clean package -Dmaven.test.skip=true install
$ unzip target/neo4j-spatial-0.11-SNAPSHOT-server-plugin.zip -d /path/to/neo4j-community-1.9.M04/plugins/
$ /path/to/neo4j-community-1.9.M04/bin/neo4j restart
If it’s installed correctly then you should see this sort of output from issuing a 'curl' against the web interface:
$ curl -L http://localhost:7474/db/data
{
"extensions" : {
...
"SpatialPlugin" : {
"addEditableLayer" : "http://localhost:7474/db/data/ext/SpatialPlugin/graphdb/addEditableLayer",
"addCQLDynamicLayer" : "http://localhost:7474/db/data/ext/SpatialPlugin/graphdb/addCQLDynamicLayer",
"findGeometriesWithinDistance" : "http://localhost:7474/db/data/ext/SpatialPlugin/graphdb/findGeometriesWithinDistance",
"updateGeometryFromWKT" : "http://localhost:7474/db/data/ext/SpatialPlugin/graphdb/updateGeometryFromWKT",
"addGeometryWKTToLayer" : "http://localhost:7474/db/data/ext/SpatialPlugin/graphdb/addGeometryWKTToLayer",
"getLayer" : "http://localhost:7474/db/data/ext/SpatialPlugin/graphdb/getLayer",
"addSimplePointLayer" : "http://localhost:7474/db/data/ext/SpatialPlugin/graphdb/addSimplePointLayer",
"findGeometriesInBBox" : "http://localhost:7474/db/data/ext/SpatialPlugin/graphdb/findGeometriesInBBox",
"addNodeToLayer" : "http://localhost:7474/db/data/ext/SpatialPlugin/graphdb/addNodeToLayer"
},
…
},
...
"neo4j_version" : "1.9.M04"
The next step was to create a spatial index containing the stadiums latitudes/longitudes.
There’s a good example in https://github.com/mneedham/spatial/blob/master/src/test/java/org/neo4j/gis/spatial/IndexProviderTest.java#L251 which I was able to adapt to do what I wanted.
I got a list of stadiums along with their locations as a CSV from Chris Bell’s blog.
The output looks like this:
Name,Team,Capacity,Latitude,Longitude
"Adams Park","Wycombe Wanderers",10284,51.6306,-0.800299
"Almondvale Stadium","Livingston",10122,55.8864,-3.52207
"Amex Stadium","Brighton and Hove Albion",22374,50.8609,-0.08014
"Anfield","Liverpool",45522,53.4308,-2.96096
"Ashton Gate","Bristol City",21497,51.44,-2.62021
"B2net Stadium","Chesterfield",10400,53.2535,-1.4272
I ended up with the following code to create nodes for each of the stadium and add them to the spatial index:
// imports excluded
public class SampleSpatialGraph {
public static void main(String[] args) throws IOException {
List<String> lines = readFile("/path/to/stadiums.csv");
EmbeddedGraphDatabase db = new EmbeddedGraphDatabase("/path/to/neo4j-community-1.9.M04/data/graph.db");
Index<Node> index = createSpatialIndex(db, "stadiumsLocation");
Transaction tx = db.beginTx();
for (String stadium : lines) {
String[] columns = stadium.split(",");
Node stadiumNode = db.createNode();
stadiumNode.setProperty("wkt", String.format("POINT(%s %s)", columns[4], columns[3]));
stadiumNode.setProperty("name", columns[0]);
index.add(stadiumNode, "dummy", "value");
}
tx.success();
tx.finish();
}
private static Index<Node> createSpatialIndex(EmbeddedGraphDatabase db, String indexName) {
return db.index().forNodes(indexName, SpatialIndexProvider.SIMPLE_WKT_CONFIG);
}
// readFile function excluded
}
The full code is on this gist if you’re interested.
We can now query the stadiums using cypher to find say the stadiums within 5 kilometres of Manchester:
START n=node:stadiumsLocation('withinDistance:[53.489271, -2.246704, 5.0]')
RETURN n.name, n.wkt;
==> +------------------------------------------------+
==> | n.name | n.wkt |
==> +------------------------------------------------+
==> | ""Etihad Stadium"" | "POINT(-2.20024 53.483)" |
==> | ""Old Trafford"" | "POINT(-2.29139 53.4631)" |
==> +------------------------------------------------+
==> 2 rows
==> 214 ms
Or we could use a bounding box query whereby we return all the stadiums within a virtual box based on coordinates. For example the following query returns all the stadiums which are within the M25:
START n=node:stadiumsLocation('bbox:[-0.519104,0.22934, 51.279958,51.69299]')
RETURN n.name, n.wkt;
==> +----------------------------------------------------+
==> | n.name | n.wkt |
==> +----------------------------------------------------+
==> | ""White Hart Lane"" | "POINT(-0.065684 51.6033)" |
==> | ""Wembley"" | "POINT(-0.279543 51.5559)" |
==> | ""Victoria Road"" | "POINT(0.159739 51.5478)" |
==> | ""Vicarage Road"" | "POINT(-0.401569 51.6498)" |
==> | ""Underhill Stadium"" | "POINT(-0.191789 51.6464)" |
==> | ""The Valley"" | "POINT(0.036757 51.4865)" |
==> | ""The Den"" | "POINT(-0.050743 51.4859)" |
==> | ""Stamford Bridge"" | "POINT(-0.191034 51.4816)" |
==> | ""Selhurst Park"" | "POINT(-0.085455 51.3983)" |
==> | ""Craven Cottage"" | "POINT(-0.221619 51.4749)" |
==> | ""Griffin Park"" | "POINT(-0.302621 51.4882)" |
==> | ""Loftus Road"" | "POINT(-0.232204 51.5093)" |
==> | ""Boleyn Ground"" | "POINT(0.039225 51.5321)" |
==> | ""Emirates Stadium"" | "POINT(-0.108436 51.5549)" |
==> | ""Brisbane Road"" | "POINT(-0.012551 51.5601)" |
==> +----------------------------------------------------+
==> 15 rows
==> 23 ms
Now I just need to wire the stadiums in with the rest of the graph and I’ll be able to write queries based on players performance in different parts of the country.
About the author
I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.