views:

61

answers:

4

Lets say I have a simple ecommerce site that sells 100 different t-shirt designs. I want to do some a/b testing to optimise my sales. Let's say I want to test two different "buy" buttons. Normally, I would use AB testing to randomly assign each visitor to see button A or button B (and try to ensure that that the user experience is consistent by storing that assignment in session, cookies etc).

Would it be possible to take a different approach and instead, randomly assign each of my 100 designs to use button A or B, and measure the conversion rate as (number of sales of design n) / (pageviews of design n)

This approach would seem to have some advantages; I would not have to worry about keeping the user experience consistent - a given page (e.g. www.example.com/viewdesign?id=6) would always return the same html. If I were to test different prices, it would be far less distressing to the user to see different prices for different designs than different prices for the same design on different computers. I also wonder whether it might be better for SEO - my suspicion is that Google would "prefer" that it always sees the same html when crawling a page.

Obviously this approach would only be suitable for a limited number of sites; I was just wondering if anyone has tried it?

A: 

You can't.

Lets 50 t-shirts have button A and the remaining 50 have button B. After your test, you realize t-shirts with button A have a better conversion rate.

Now - was the conversion better because of button A, or was it better because the t-shirt designs were really cool and people liked them?

You can't answer that question objectively, so you can't do A/B testing in this manner.

sri
but isn't it pretty unlikely that all 50 t-shirts with button A are better designs, given that they have been randomly assigned?
mojones
Yes, but are you really going to be selling equal numbers of all 100 designs? Realistically, about 20% of your products will account for about 80% of your sales. What if all three of your top three products end up in the same group? I think you're trying to take a shortcut, and you're welcome to do so, but don't expect that it'll be "just as good" as doing it properly.
Iain Galloway
OK, but (not looking for an argument here, just trying to think it through).....Won't the 80/20 problem be taken care of by measuring conversions relative to displays? The 20% of products that net 80% of sales will also net 80% of page views, so the number of sales per page view will be more-or-less constant across both popular and unpopular products. Don't AB test statistics take into account differences in the number of samples in the two groups?
mojones
I don't think you can safely assume that the 20% of products that net 80% of sales also net 80% of pageviews - although that's something that you can measure in advance. Tristan's idea of using the previous month's analytics to create pairs of neighbours is a good one to try to minimise the bias though.
Iain Galloway
A: 

The trouble with your approach is that you're testing two things at the same time.

Say, design x is using button a. Design y is using button b. Design y gets more sales, and more conversions.

Is that because button b gives a better conversion rate than button a, or is that because design y gives a better conversion rate than design x?

If your volume of designs is very high, your volume of users is very low, and your conversions are distributed evenly amongst your designs, I could see your approach being better than the normal fashion - because the risk that the "good" designs clump together and skew your result would be smaller than the risk that the "good" users do. However, in that case you won't have a particularly large sample size of conversions to draw conclusions from - you need a sufficiently high volume of users for AB testing to be worthwhile in the first place.

Iain Galloway
Agreed. I suppose the efficacy of this approach would depend on the number of designs - given a large enough number, and assuming random assignment, it shouldn't be possible for a single 'good' design to skew the results.
mojones
Well, that's the thing. Going by the 80/20 rule, a small bias in the distribution of "good" designs could significantly bias your conclusion. Check those "if"s. Your number of users is going to be very high compared to your number of designs (otherwise AB testing is going to be worthless in the first place), and your conversions are certainly not going to be evenly distributed amongst your designs. The recommendation is to vary based on user for a reason.
Iain Galloway
A: 

Instead of changing the sale button for some pages, run all pages with button A for a week and then change to button B for another week. That should give you enough data to see whether the number of sales change significantly between the two buttons.

A week should be short enough that seasonal/weather effect shouldn't apply.

Aaron Digulla
+1  A: 

Your intuition is correct. In theory, randomizing by page will work fine. Both treatment groups will have balanced characteristics in expectation.

However, the sample size is quite small so you need to be careful. Simple randomization may create imbalance by chance. The standard solution is to block on pre-treatment characteristics of the shirts. The most important characteristic is your pre-treatment outcome, which I assume is the conversion rate.

There are many ways to create "balanced" randomized designs. For instance, you you could create pairs using optimal matching, and randomize within pairs. A rougher match could be found by ranking pages by their conversion rate in the previous week/month and then creating pairs of neighbors. Or you could combine blocked randomization within Aaron's suggestion: randomize within pairs and then flip the treatment each week.

A second concern, somewhat unrelated, is interaction between treatments. This may be more problematic. It's possible that if a user sees one button on one page and then a different button on a different page, that new button will have a particularly large effect. That is, can you really view treatments as independent? Does the button on one page affect the likelihood of conversion on another? Unfortunately, it probably does, particularly because if you buy a t-shirt on one page, you're probably very unlikely to buy a t-shirt on the other page. I'd worry about this more than the randomization. The standard approach -- randomizing by unique user -- better mimics your final design.

You could always run an experiment to see if you get the same results using these two methods, and then proceed with the simpler one if you do.

Tristan