R: Modelling a conversion rate with a binomial distribution
As part of some work Sid and I were doing last week we wanted to simulate the conversion rate for an A/B testing we were planning.
We started with the following function which returns the simulated conversion rate for a given conversion rate of 12%:
generateConversionRates <- function(sampleSize) {
sample_a <- rbinom(seq(0, sampleSize), 1, 0.12)
conversion_a <- length(sample_a[sample_a == 1]) / sampleSize
sample_b <- rbinom(seq(0, sampleSize), 1, 0.12)
conversion_b <- length(sample_b[sample_b == 1]) / sampleSize
c(conversion_a, conversion_b)
}
If we call it:
> generateConversionRates(10000)
[1] 0.1230 0.1207
We have a 12.3% conversion rate on A and a 12.07% conversion rate on B based on 10,000 sample values.
We then wrote the following function to come up with 1000 versions of those conversion rates:
generateSample <- function(sampleSize) {
lapply(seq(1, 1000), function(x) generateConversionRates(sampleSize))
}
We can call that like this:
> getSample(10000)
[[998]]
[1] 0.1179 0.1216
[[999]]
[1] 0.1246 0.1211
[[1000]]
[1] 0.1248 0.1234
We were then using these conversion rates to try and work out how many samples we needed to include in an A/B test to have reasonable confidence that it represented the population.
We actually ended up abandoning that exercise but I thought I’d record the code because I thought it was pretty interesting.
About the author
I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.