28 Feb 2017

Neo4j: Graphing the 'My name is...I work' Twitter meme

Over the last few days I’ve been watching the chain of 'My name is...' tweets kicked off by DHH with interest. As I understand it, the idea is to show that coding interview riddles/hard tasks on a whiteboard are ridiculous.

Hello, my name is David. I would fail to write bubble sort on a whiteboard. I look code up on the internet all the time. I don't do riddles. __

Other people quoted that tweet and added their own piece and yesterday Eduardo Hernacki suggested that traversing this chain of tweets seemed tailor made for Neo4j.

— DHH (@dhh) https://twitter.com/dhh/status/834146806594433025[February 21
2017]

@eduardohki is someone traversing all this stuff? #Neo4j __

Michael was quickly on the scene and created a Cypher query which calls the Twitter API and creates a Neo4j graph from the resulting JSON response. The only tricky bit is creating a 'bearer token' but Jason Kotchoff has a helpful gist showing how to generate one from your Twitter consumer key and consumer secret.

Now that we’re got our bearer token let’s create a parameter to store it. Type the following in the Neo4j browser:

:param bearer: '<your-bearer-token-goes-here>'

Now we’re ready to query the Twitter API. We’ll start with the search API and find all tweets which contain the text '"my name" "I work"'. That will return a JSON response containing lots of tweets. We’ll then create a node for each tweet it returns, a node for the user who posted the tweet, a node for the tweet it quotes, and relationships to glue them all together.

We’re going to use the apoc.load.jsonParams procedure from the APOC library to help us import the data. If you want to follow along you can use a Neo4j sandbox instance which comes with APOC installed. For your local Neo4j installation, grab the APOC jar and put it into your plugins folder before restarting Neo4j.

This is the query in full:

WITH 'https://api.twitter.com/1.1/search/tweets.json?count=100&result_type=recent&lang=en&q=' as url, {bearer} as bearer

CALL apoc.load.jsonParams(url + "%22my%20name%22%20is%22%20%22I%20work%22",{Authorization:"Bearer "+bearer},null) yield value

UNWIND value.statuses as status
WITH status, status.user as u, status.entities as e
WHERE status.quoted_status_id is not null

// create a node for the original tweet
MERGE (t:Tweet {id:status.id})
ON CREATE SET t.text=status.text,t.created_at=status.created_at,t.retweet_count=status.retweet_count, t.favorite_count=status.favorite_count

// create a node for the author + a POSTED relationship from the author to the tweet
MERGE (p:User {name:u.screen_name})
MERGE (p)-[:POSTED]->(t)

// create a MENTIONED relationship from the tweet to any users mentioned in the tweet
FOREACH (m IN e.user_mentions | MERGE (mu:User {name:m.screen_name}) MERGE (t)-[:MENTIONED]->(mu))

// create a node for the quoted tweet and create a QUOTED relationship from the original tweet to the quoted one
MERGE (q:Tweet {id:status.quoted_status_id})
MERGE (t)–[:QUOTED]->(q)

// repeat the above steps for the quoted tweet
WITH t as t0, status.quoted_status as status WHERE status is not null
WITH t0, status, status.user as u, status.entities as e

MERGE (t:Tweet {id:status.id})
ON CREATE SET t.text=status.text,t.created_at=status.created_at,t.retweet_count=status.retweet_count, t.favorite_count=status.favorite_count

MERGE (t0)-[:QUOTED]->(t)

MERGE (p:User {name:u.screen_name})
MERGE (p)-[:POSTED]->(t)

FOREACH (m IN e.user_mentions | MERGE (mu:User {name:m.screen_name}) MERGE (t)-[:MENTIONED]->(mu))

MERGE (q:Tweet {id:status.quoted_status_id})
MERGE (t)–[:QUOTED]->(q);

The resulting graph looks like this:

MATCH p=()-[r:QUOTED]->() RETURN p LIMIT 25

A more interesting query would be to find the path from DHH to Eduardo which we can find with the following query:

match path = (dhh:Tweet {id: 834146806594433025})<-[:QUOTED*]-(eduardo:Tweet{id: 836400531983724545})
UNWIND NODES(path) AS tweet
MATCH (tweet)<-[:POSTED]->(user)
RETURN tweet, user

This query:

starts from DHH’s tweet
traverses all QUOTED relationships until it finds Eduardo’s tweet
collects all those tweets and then finds the author
returns the tweet and the author

And this is the output:

I ran a couple of other queries against the Twitter API to hydrate some nodes that we hadn’t set all the properties on - you can see all the queries on this gist.

For the next couple of days I also have a sandbox running https://10-0-1-157-32898.neo4jsandbox.com/browser/. You can login using the credentials readonly/twitter.

</p>

If you have any questions/suggestions let me know in the comments, @markhneedham on twitter, or email the Neo4j DevRel team - devrel@neo4j.com.

About the author

I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.