CouchDB: Join like behaviour with link functions
I’ve been playing around with the Twitter streaming API a bit lately to see which links are being posted most frequently by the people I follow and then storing the appropriate tweets in CouchDB.
I recently came across a problem which I struggled to solve for quite a while.
Based on the following map function:
{
"_id" : "_design/query",
"views" : {
"by_link" : {
"map" : "function(doc){ emit(doc.actual_link, { user : doc.user.screen_name, text : doc.text })}"
}
}
}
Which results in the following data set:
curl http://127.0.0.1:5984/twitter_links/_design/query/_view/by_link?limit=20
{"total_rows":7035,"offset":0,"rows":[
{"id":"abf54db1d92bfe0e8aaaa9ec51f237bd","key":"http://2dboy.com/2011/02/08/ipad-launch/","value":{"user":"Nash","text":"World of Goo\u2019s iPad Launch http://instapaper.com/zzqrqw32e"}},
{"id":"b8911545ff45438671081260ae0d42b1","key":"http://3.bp.blogspot.com/_T6MpHfZv2qQ/SpKGGjsoQoI/AAAAAAAADIA/Jsa5JDqX9X0/s400/moleskine3.jpg","value":{"user":"oinonio","text":"@stephenfry a Babushka Little My? http://bit.ly/fjPg2a"}},
{"id":"be12d30d1c8b882d8ce0124585fabb19","key":"http://3.bp.blogspot.com/_UAzEooLfuI8/S7aOiCBdAzI/AAAAAAAAF8Y/5W61I9VHxPE/s1600-h/deforestation.jpg","value":{"user":"ironshay","text":"A big problem caused by deforestation http://bit.ly/9qArCg"}}
]}
What I want to do is go from…
-
Link Url 1 -> Tweet 1
-
Link Url 1 -> Tweet 2
-
Link Url 2 -> Tweet 3
…to…
-
Link Url 1 -> [Tweet 1, Tweet 2]
-
Link Url 2 -> [Tweet3]
I originally tried to use a reduce function after following Chris Chandler’s blog post but that resulted in a 'reduce_overflow_error'.
Perryn pointed out that what I probably needed was a link function and I came across Chris Strom’s blog while trying to work out how to do that.
{
"_id" : "_design/query",
"views" : {
"by_link" : {
"map" : "function(doc){ emit(doc.actual_link, { user : doc.user.screen_name, text : doc.text })}"
}
},
"lists" : {
"index_tweets" : "function(head, req) {
var row, last_key, tweets;
send('{\"rows\" : [');
while(row = getRow()) {
if(last_key != row.key ) {
if(last_key != 'undefined') {
send(toJSON({key : last_key, values : tweets}));
send(',');
}
tweets = [];
last_key = row.key;
}
tweets.push(row.value);
}
send(toJSON({key : last_key, values : tweets}));
send(']}');
}"
}
}
We then call the list function with an associated view function following this pattern from CouchDB: The Definitive Guide:
/db/_design/foo/_list/list-name/view-name
curl http://127.0.0.1:5984/twitter_links/_design/query/_list/index_tweets/by_link
Which gives the data in the required format:
{"rows" : [
{"key":"http://1.bp.blogspot.com/_XdP6Lp2ceqY/TU16NvdT-RI/AAAAAAAAlb8/7QtTN-XxBTM/s400/dcrHk.jpg","values":[{"user":"jhartikainen","text":"RT @codepo8: The dark secret of PacMan: http://bit.ly/exCBDy"}, {"user":"joedevon","text":"RT @codepo8: The dark secret of PacMan: http://bit.ly/exCBDy"}]},
{"key":"http://10poundpom.blogspot.com/","values":[{"user":"10poundpomCL","text":"@andy_murray Help my #£10aWeekCharityChallenge, all it takes is a RT. Read http://10poundpom.blogspot.com/ for more."}]},
{"key":"http://10rem.net/blog/2011/02/09/enhancing-the-wpf-screen-capture-program-with-window-borders","values":[{"user":"brian_henderson","text":"Enhancing the WPF Screen Capture Program with Window Borders: by @Pete_Brown: http://bit.ly/icmXG5 #wpf #win32"},{"user":"SittenSpynne","text":"RT @Pete_Brown: Blogged: Enhancing the WPF Screen Capture Program with Window Borders http://bit.ly/icmXG5 #wpf #win32"}]}]}
Maybe there’s an even better way to solve this problem that I don’t know about…let me know if there is!
About the author
I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.