R: dplyr - group_by dynamic or programmatic field / variable (Error: index out of bounds)
In my last blog post I showed how to group timestamp based data by week, month and quarter and by the end we had the following code samples using dplyr and zoo:
library(RNeo4j)
library(zoo)
timestampToDate <- function(x) as.POSIXct(x / 1000, origin="1970-01-01", tz = "GMT")
query = "MATCH (:Person)-[:HAS_MEETUP_PROFILE]->()-[:HAS_MEMBERSHIP]->(membership)-[:OF_GROUP]->(g:Group {name: \"Neo4j - London User Group\"})
RETURN membership.joined AS joinTimestamp"
meetupMembers = cypher(graph, query)
meetupMembers$joinDate <- timestampToDate(meetupMembers$joinTimestamp)
meetupMembers$monthYear <- as.Date(as.yearmon(meetupMembers$joinDate))
meetupMembers$quarterYear <- as.Date(as.yearqtr(meetupMembers$joinDate))
meetupMembers %.% group_by(week) %.% summarise(n = n())
meetupMembers %.% group_by(monthYear) %.% summarise(n = n())
meetupMembers %.% group_by(quarterYear) %.% summarise(n = n())
As you can see there’s quite a bit of duplication going on - the only thing that changes in the last 3 lines is the name of the field that we want to group by.
I wanted to pull this code out into a function and my first attempt was this:
groupMembersBy = function(field) {
meetupMembers %.% group_by(field) %.% summarise(n = n())
}
And now if we try to group by week:
> groupMembersBy("week")
Show Traceback
Rerun with Debug
Error: index out of bounds
It turns out if we want to do this then we actually want the regroup function rather than group_by:
groupMembersBy = function(field) {
meetupMembers %.% regroup(list(field)) %.% summarise(n = n())
}
And now if we group by week:
> head(groupMembersBy("week"), 20)
Source: local data frame [20 x 2]
week n
1 2011-06-02 8
2 2011-06-09 4
3 2011-06-16 1
4 2011-06-30 2
5 2011-07-14 1
6 2011-07-21 1
7 2011-08-18 1
8 2011-10-13 1
9 2011-11-24 2
10 2012-01-05 1
11 2012-01-12 3
12 2012-02-09 1
13 2012-02-16 2
14 2012-02-23 4
15 2012-03-01 2
16 2012-03-08 3
17 2012-03-15 5
18 2012-03-29 1
19 2012-04-05 2
20 2012-04-19 1
Much better!
About the author
I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.