R: Grouping by week, month, quarter
In my continued playing around with R and meetup data I wanted to have a look at when people joined the London Neo4j group based on week, month or quarter of the year to see when they were most likely to do so.
I started with the following query to get back the join timestamps:
library(RNeo4j)
query = "MATCH (:Person)-[:HAS_MEETUP_PROFILE]->()-[:HAS_MEMBERSHIP]->(membership)-[:OF_GROUP]->(g:Group {name: \"Neo4j - London User Group\"})
RETURN membership.joined AS joinTimestamp"
meetupMembers = cypher(graph, query)
> head(meetupMembers)
joinTimestamp
1 1.376572e+12
2 1.379491e+12
3 1.349454e+12
4 1.383127e+12
5 1.372239e+12
6 1.330295e+12
The first step was to get joinDate into a nicer format that we can use in R more easily:
timestampToDate <- function(x) as.POSIXct(x / 1000, origin="1970-01-01", tz = "GMT")
meetupMembers$joinDate <- timestampToDate(meetupMembers$joinTimestamp)
> head(meetupMembers)
joinTimestamp joinDate
1 1.376572e+12 2013-08-15 13:13:40
2 1.379491e+12 2013-09-18 07:55:11
3 1.349454e+12 2012-10-05 16:28:04
4 1.383127e+12 2013-10-30 09:59:03
5 1.372239e+12 2013-06-26 09:27:40
6 1.330295e+12 2012-02-26 22:27:00
Much better!
I started off with grouping by month and quarter and came across the excellent zoo library which makes it really easy to transform dates:
library(zoo)
meetupMembers$monthYear <- as.Date(as.yearmon(meetupMembers$joinDate))
meetupMembers$quarterYear <- as.Date(as.yearqtr(meetupMembers$joinDate))
> head(meetupMembers)
joinTimestamp joinDate monthYear quarterYear
1 1.376572e+12 2013-08-15 13:13:40 2013-08-01 2013-07-01
2 1.379491e+12 2013-09-18 07:55:11 2013-09-01 2013-07-01
3 1.349454e+12 2012-10-05 16:28:04 2012-10-01 2012-10-01
4 1.383127e+12 2013-10-30 09:59:03 2013-10-01 2013-10-01
5 1.372239e+12 2013-06-26 09:27:40 2013-06-01 2013-04-01
6 1.330295e+12 2012-02-26 22:27:00 2012-02-01 2012-01-01
The next step was to create a new data frame which grouped the data by those fields. I’ve been learning dplyr as part of Udacity’s EDA course so I thought I’d try and use that:
> head(meetupMembers %.% group_by(monthYear) %.% summarise(n = n()), 20)
monthYear n
1 2011-06-01 13
2 2011-07-01 4
3 2011-08-01 1
4 2011-10-01 1
5 2011-11-01 2
6 2012-01-01 4
7 2012-02-01 7
8 2012-03-01 11
9 2012-04-01 3
10 2012-05-01 9
11 2012-06-01 5
12 2012-07-01 16
13 2012-08-01 32
14 2012-09-01 14
15 2012-10-01 28
16 2012-11-01 31
17 2012-12-01 7
18 2013-01-01 52
19 2013-02-01 49
20 2013-03-01 22
> head(meetupMembers %.% group_by(quarterYear) %.% summarise(n = n()), 20)
quarterYear n
1 2011-04-01 13
2 2011-07-01 5
3 2011-10-01 3
4 2012-01-01 22
5 2012-04-01 17
6 2012-07-01 62
7 2012-10-01 66
8 2013-01-01 123
9 2013-04-01 139
10 2013-07-01 117
11 2013-10-01 94
12 2014-01-01 266
13 2014-04-01 359
14 2014-07-01 216
Grouping by week number is a bit trickier but we can do it with a bit of transformation on our initial timestamp:
meetupMembers$week <- as.Date("1970-01-01")+7*trunc((meetupMembers$joinTimestamp / 1000)/(3600*24*7))
> head(meetupMembers %.% group_by(week) %.% summarise(n = n()), 20)
week n
1 2011-06-02 8
2 2011-06-09 4
3 2011-06-16 1
4 2011-06-30 2
5 2011-07-14 1
6 2011-07-21 1
7 2011-08-18 1
8 2011-10-13 1
9 2011-11-24 2
10 2012-01-05 1
11 2012-01-12 3
12 2012-02-09 1
13 2012-02-16 2
14 2012-02-23 4
15 2012-03-01 2
16 2012-03-08 3
17 2012-03-15 5
18 2012-03-29 1
19 2012-04-05 2
20 2012-04-19 1
We can then plug that data frame into ggplot if we want to track membership sign up over time at different levels of granularity and create some bar charts of scatter plots depending on what we feel like!
About the author
I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.