views:

197

answers:

1

I am trying to do some statistical analysis of different A/B tests to see which alternative is better and have found conflicting information about this.

First, I am interested in a couple different things:

  • Tests that measure success by counting events, such as conversions or emails sent
  • Tests that measure success by counting revenue
  • Tests that have only two alternatives (control and new)
  • Tests that have multiple alternatives (control and multiple new)

I was hoping to find a simple set of formulae or rules for doing this analysis but have found more questions than answers.

This site says that you can't compare multi-alternative tests; you can only do pairwise comparisons and do a chi-squared analysis to see if the whole test is statistically significant or not.

This site Suggests a way to do A/B/C/D testing (starts on slide 74), analysing the results using the G-Test (which it says is related to chi-squared) but isn't clear on the details of using a fudge factor. It also suggests that you can only use the A/B/C/D approach to eliminate alternatives until you end up with a clear winner in an A/B comparison.

This site gives an example of an A/B/C/D test (including control) and shows how to compare the conversion rate to determine a winner. Unlike this approach it does not recommend eliminating alternatives but rather picks a winner right off the bat (Assuming statistically significant results).

Perhaps I'm naive but I would think that by now a stats analysis library would exist to deal with this very problem. I would also appreciate more information about what algorithms/equations are needed to solve these problems. It's been a long time since my university Stats class.

+1  A: 

For the event generating comparison, you could approach this using Beta distributions. Each alternative has some unobserved p, the probability of producing an event. If you observe X positive events out of N, then your uncertainty about p can be modeled by Beta(X+1,N-X+1).

You can compare two alternatives by looking at P(pA > pB), where pA and pB are the two Beta distributions. Methods for computing that inequality probability can be found in this paper.

You can also compute E[pA-pB], the effect size, or compute confidence bounds of the same.

thouis
Also, you might watch this blog (the next post is supposed to be on this subject): http://sirevanhaas.com/?p=30
thouis
And you might read chapter 37 of this book: http://www.inference.phy.cam.ac.uk/mackay/itila/book.htmlIndividual chapters are available here: http://www.inference.phy.cam.ac.uk/mackay/itprnn/ps/
thouis
Second post is now up: http://sirevanhaas.com/?p=64
thouis