July 20, 2010

A/B Testing Gone Bad

For many of us, testing website strategies is a matter of running simple "A/B" tests.

This process seems easy enough. There are automated algorithms that compute how many conversions are required before one strategy can be labeled as the "winner" against another strategy.

Most of the time, this process is done on the basis of "conversions" ... did a customer buy something, did a customer subscribe to something ... you get the picture.

And most of the time, this strategy yields an outcome that is wrong.

Yup, you heard it here first. Your conversion-based A/B tests, while yielding a statistically significant outcome from a conversion ra
te standpoint, are yielding an "incorrect" outcome when evaluating what matters ... spend per visitor.

Here's why ... for those of us in e-commerce, we are measuring "conversion", when we should be measuring "dollars per visitor". And "dollars per visitor" is the component of two metrics.
  • Conversion Rate.
  • Dollars per Conversion.
When you multiply variability (conversion rate) by variability (dollars per conversion), you get ... VARIABILITY! Lots of variability!

Take a look at this example. This is a test, eleven customers in each group (for illustrative purposes only).







































































































The test group clearly outperforms the control group (7/11 conversions vs. 2/11 conversions).

Also notice that the average amount spent per conversion is $100 per conversion, the same across each group.

However, the variability associated with spending amounts (they vary between $30 and $170 per order in both the test group and the control group) causes the t-test to yield a result that is not nearly as statistically significant as when measuring conversion rate alone.

Well guess what? This happens ... all of the time! Every time you run one of those software-based automated conversion rate A/B tests that measures responses until one outcome is statistically significant, you guarantee that measurement based on dollars per visitor is not going to be significant ... in other words, every test you are running gives you a statistically insignificant outcome!!

So do me a favor ... please, I beseech thee ... please run your A/B conversion rate tests against the statistical significance of dollars per visitor, not on the basis of conversion rate.

You will learn that you have to have 2x - 4x as many customers in your test, but your results will be valid. As being executed now, so many of the tests being executed out there are yielding garbage for results ... and this partially explains why a decade of conversion rate optimization has yielded websites that, by and large, convert fewer customers than in the past.