· neo4j

Neo4j: Cypher - Separation of concerns

While preparing my talk on building Neo4j backed applications with Clojure I realised that some of the queries I’d written were incredibly complicated and went against anything I’d learnt about separating different concerns.

One example of this was the query I used to generate the data for the following page of the meetup application I’ve been working on:

2014 06 27 08 19 34
2014 06 27 08 31 13

Depending on the selected tab you can choose to see the people signed up for the meetup and the date that they signed up or the topics that those people are interested in.

For reference, this is an outline of the schema of the graph behind the application:

2014 06 27 11 51 00

This was my initial query to get the data:

MATCH (event:Event {id: {eventId}})-[:HELD_AT]->(venue)
OPTIONAL MATCH (event)<-[:TO]-(rsvp)<-[:RSVPD]-(person)
OPTIONAL MATCH (person)-[:INTERESTED_IN]->(topic) WHERE ()-[:HAS_TOPIC]->(topic)
WITH event, venue, rsvp, person, COLLECT(topic) as topics ORDER BY rsvp.time
OPTIONAL MATCH (rsvp)<-[:NEXT]-(initial)
WITH event, venue, COLLECT({rsvp: rsvp, initial: initial, person: person, topics: topics}) AS responses
WITH event, venue,
    [response in responses WHERE response.initial is null AND response.rsvp.response = "yes"] as attendees,
    [response in responses WHERE NOT response.initial is null] as dropouts, responses
UNWIND([response in attendees | response.topics]) AS topics
UNWIND(topics) AS topic
WITH event, venue, attendees, dropouts, {id: topic.id, name:topic.name, freq:COUNT(*)} AS t
RETURN event, venue, attendees, dropouts, COLLECT(t) AS topics

The first two lines of the query works out which people have RSVP’d to a particular event, the 3rd line captures the topics they’re interested in as long as the topic is linked to at least one of the NoSQL London groups.

We then optionally capture their initial RSVP in case they’ve changed it before doing a bit of data manipulation to group everything together.

If we run a slight variation of that which only shows a few of the topics, attendees and dropouts this is the type of result we get:

+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| event.name                               | venue.name      | [a IN attendees[0..5] | a.person.name]                                 | [d in dropouts[0..5] | d.person.name]                              | topics[0..5]                                                                                                                                                                                                                                                    |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| "Building Neo4j backed web applications" | "Skills Matter" | ["Mark Needham","Alistair Jones","Jim Webber","Axel Morgner","Ramesh"] | ["Frank Gibson","Keith Hinde","Richard Mason","Ollie Glass","Tom"] | [{id -> 10538, name -> "Business Intelligence", freq -> 3},{id -> 61680, name -> "HBase", freq -> 3},{id -> 61679, name -> "Hive", freq -> 2},{id -> 193021, name -> "Graph Databases", freq -> 12},{id -> 85951, name -> "JavaScript Frameworks", freq -> 10}] |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

The problem is we’ve mixed together two different concerns - the attendees to a meetup and the topics they’re interested in - which made the query quite hard to understand when I came back to it a couple of months later.

Instead what we can do is split the query in two and make two different calls to the server. We then end up with the following:

// Get the event + attendees + dropouts
MATCH (event:Event {id: {eventId}})-[:HELD_AT]->(venue)
OPTIONAL MATCH (event)<-[:TO]-(rsvp)<-[:RSVPD]-(person)
WITH event, venue, rsvp, person ORDER BY rsvp.time
OPTIONAL MATCH (rsvp)<-[:NEXT]-(initial)
WITH event, venue, COLLECT({rsvp: rsvp, initial: initial, person: person}) AS responses
WITH event, venue,
    [response in responses WHERE response.initial is null
                           AND response.rsvp.response = "yes"] as attendees,
    [response in responses WHERE NOT response.initial is null] as dropouts
RETURN event, venue, attendees, dropouts
// Get the topics the attendees are interested in
MATCH (event:Event {id: {eventId}})
MATCH (event)<-[:TO]-(rsvp {response: "yes"})<-[:RSVPD]-(person)-[:INTERESTED_IN]->(topic)
WHERE ()-[:HAS_TOPIC]->(topic)
RETURN topic.id AS id, topic.name AS name, COUNT(*) AS freq

The first query is still a bit complex but that’s because there’s a bit of tricky logic to distinguish people who signed up and dropped out. However, the second query is now quite easy to read and expresses it’s intent very clearly.

  • LinkedIn
  • Tumblr
  • Reddit
  • Google+
  • Pinterest
  • Pocket