R: Removing for loops
In my last blog post I showed the translation of a likelihood function from Think Bayes into R and in my first attempt at this function I used a couple of nested for loops.
likelihoods = function(names, mixes, observations) {
scores = rep(1, length(names))
names(scores) = names
for(name in names) {
for(observation in observations) {
scores[name] = scores[name] * mixes[[name]][observation]
}
}
return(scores)
}
Names = c("Bowl 1", "Bowl 2")
bowl1Mix = c(0.75, 0.25)
names(bowl1Mix) = c("vanilla", "chocolate")
bowl2Mix = c(0.5, 0.5)
names(bowl2Mix) = c("vanilla", "chocolate")
Mixes = list("Bowl 1" = bowl1Mix, "Bowl 2" = bowl2Mix)
Mixes
Observations = c("vanilla", "vanilla", "vanilla", "chocolate")
l = likelihoods(Names, Mixes, Observations)
> l / sum(l)
Bowl 1 Bowl 2
0.627907 0.372093
We pass in a vector of bowls, a nested dictionary describing the mixes of cookies in each bowl and the observations that we’ve made. The function tells us that there’s an almost 2/3 probability of the cookies coming from Bowl 1 and just over 1/3 of being Bowl 2.
In this case there probably won’t be much of a performance improvement by getting rid of the loops but we should be able to write something that’s more concise and hopefully idiomatic.
Let’s start by getting rid of the inner for loop. That can be replace by a call to the http://adv-r.had.co.nz/Functionals.html function like so:
likelihoods2 = function(names, mixes, observations) {
scores = rep(0, length(names))
names(scores) = names
for(name in names) {
scores[name] = Reduce(function(acc, observation) acc * mixes[[name]][observation], Observations, 1)
}
return(scores)
}
l2 = likelihoods2(Names, Mixes, Observations)
> l2 / sum(l2)
Bowl 1 Bowl 2
0.627907 0.372093
So that’s good, we’ve still got the same probabilities as before. Now to get rid of the outer for loop. The http://adv-r.had.co.nz/Functionals.html function helps us out here:
likelihoods3 = function(names, mixes, observations) {
scores = rep(0, length(names))
names(scores) = names
scores = Map(function(name)
Reduce(function(acc, observation) acc * mixes[[name]][observation], Observations, 1),
names)
return(scores)
}
l3 = likelihoods3(Names, Mixes, Observations)
> l3
$`Bowl 1`
vanilla
0.1054688
$`Bowl 2`
vanilla
0.0625
We end up with a list instead of a vector which we need to fix by using the https://stat.ethz.ch/R-manual/R-devel/library/base/html/unlist.html function:
likelihoods3 = function(names, mixes, observations) {
scores = rep(0, length(names))
names(scores) = names
scores = Map(function(name)
Reduce(function(acc, observation) acc * mixes[[name]][observation], Observations, 1),
names)
return(unlist(scores))
}
l3 = likelihoods3(Names, Mixes, Observations)
> l3 / sum(l3)
Bowl 1.vanilla Bowl 2.vanilla
0.627907 0.372093
Now we just have this annoying 'vanilla' in the name. That’s fixed easily enough:
likelihoods3 = function(names, mixes, observations) {
scores = rep(0, length(names))
names(scores) = names
scores = Map(function(name)
Reduce(function(acc, observation) acc * mixes[[name]][observation], Observations, 1),
names)
result = unlist(scores)
names(result) = names
return(result)
}
l3 = likelihoods3(Names, Mixes, Observations)
> l3 / sum(l3)
Bowl 1 Bowl 2
0.627907 0.372093
A slightly cleaner alternative makes use of the https://stat.ethz.ch/R-manual/R-devel/library/base/html/lapply.html function:
likelihoods3 = function(names, mixes, observations) {
scores = rep(0, length(names))
names(scores) = names
scores = sapply(names, function(name)
Reduce(function(acc, observation) acc * mixes[[name]][observation], Observations, 1))
names(scores) = names
return(scores)
}
l3 = likelihoods3(Names, Mixes, Observations)
> l3 / sum(l3)
Bowl 1 Bowl 2
0.627907 0.372093
That’s the best I’ve got for now but I wonder if we could write a version of this using matrix operations some how - but that’s for next time!
About the author
I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.