R: Refactoring to dplyr
I’ve been looking back over some of the early code I wrote using R before I knew about the dplyr library and thought it’d be an interesting exercise to refactor some of the snippets.
We’ll use the following data frame for each of the examples:
library(dplyr)
data = data.frame(
letter = sample(LETTERS, 50000, replace = TRUE),
number = sample (1:10, 50000, replace = TRUE)
)
Take {n} rows
> data[1:5,]
letter number
1 R 7
2 Q 3
3 B 8
4 R 3
5 U 2
becomes:
> data %>% head(5)
letter number
1 R 7
2 Q 3
3 B 8
4 R 3
5 U 2
Order by numeric value descending
> data[order(-(data$number)),][1:5,]
letter number
14 H 10
17 G 10
63 L 10
66 W 10
73 R 10
becomes:
> data %>% arrange(desc(number)) %>% head(5)
letter number
1 H 10
2 G 10
3 L 10
4 W 10
5 R 10
Count number of items
> length(data[,1])
[1] 50000
becomes:
> data %>% count()
Source: local data frame [1 x 1]
n
1 50000
Filter by column value
> length(subset(data, number == 1)[, 1])
[1] 4928
becomes:
> data %>% filter(number == 1) %>% count()
Source: local data frame [1 x 1]
n
1 4928
Group by variable and count
> aggregate(data, by= list(data$number), function(x) length(x))
Group.1 letter number
1 1 4928 4928
2 2 5045 5045
3 3 5064 5064
4 4 4823 4823
5 5 5032 5032
6 6 5163 5163
7 7 4945 4945
8 8 5077 5077
9 9 5025 5025
10 10 4898 4898
becomes:
> data %>% count(number)
Source: local data frame [10 x 2]
number n
1 1 4928
2 2 5045
3 3 5064
4 4 4823
5 5 5032
6 6 5163
7 7 4945
8 8 5077
9 9 5025
10 10 4898
Select a range of rows
> data[4:5,]
letter number
4 R 3
5 U 2
becomes:
> data %>% slice(4:5)
letter number
1 R 3
2 U 2
There’s certainly more code in some of the dplyr examples but I find it easier to remember how the dplyr code works when I come back to it and hence tend to favour that approach.
About the author
I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.