R: Apply a custom function across multiple lists
In my continued playing around with R I wanted to map a custom function over two lists comparing each item with its corresponding items.
If we just want to use a built in function such as subtraction between two lists it’s quite easy to do:
> c(10,9,8,7,6,5,4,3,2,1) - c(5,4,3,4,3,2,2,1,2,1)
[1] 5 5 5 3 3 3 2 2 0 0
I wanted to do a slight variation on that where instead of returning the difference I wanted to return a text value representing the difference e.g. '5 or more', '3 to 5' etc.
I spent a long time trying to figure out how to do that before finding an excellent blog post which describes all the different 'apply' functions available in R.
As far as I understand 'apply' is the equivalent of 'map' in Clojure or other functional languages.
In this case we want the http://stat.ethz.ch/R-manual/R-patched/library/base/html/mapply.html variant which we can use like so:
> mapply(function(x, y) {
if((x-y) >= 5) {
"5 or more"
} else if((x-y) >= 3) {
"3 to 5"
} else {
"less than 5"
}
}, c(10,9,8,7,6,5,4,3,2,1),c(5,4,3,4,3,2,2,1,2,1))
[1] "5 or more" "5 or more" "5 or more" "3 to 5" "3 to 5" "3 to 5" "less than 5"
[8] "less than 5" "less than 5" "less than 5"
We could then pull that out into a function if we wanted:
summarisedDifference <- function(one, two) {
mapply(function(x, y) {
if((x-y) >= 5) {
"5 or more"
} else if((x-y) >= 3) {
"3 to 5"
} else {
"less than 5"
}
}, one, two)
}
which we could call like so:
> summarisedDifference(c(10,9,8,7,6,5,4,3,2,1),c(5,4,3,4,3,2,2,1,2,1))
[1] "5 or more" "5 or more" "5 or more" "3 to 5" "3 to 5" "3 to 5" "less than 5"
[8] "less than 5" "less than 5" "less than 5"
I also wanted to be able to compare a list of items to a single item which was much easier than I expected:
> summarisedDifference(c(10,9,8,7,6,5,4,3,2,1), 1)
[1] "5 or more" "5 or more" "5 or more" "5 or more" "5 or more" "3 to 5" "3 to 5"
[8] "less than 5" "less than 5" "less than 5"
If we wanted to get a summary of the differences between the lists we could plug them into ddply like so:
> library(plyr)
> df = data.frame(x=c(10,9,8,7,6,5,4,3,2,1), y=c(5,4,3,4,3,2,2,1,2,1))
> ddply(df, .(difference=summarisedDifference(x,y)), summarise, count=length(x))
difference count
1 3 to 5 3
2 5 or more 3
3 less than 5 4
About the author
I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.