January 03, 2008

Testing Issues

Recall that my focus in 2008 is on multichannel profitability.

Experimental design (aka 'tests') is one of the most useful tools available to help us understand multichannel profitability.

We run into a ton of problems when designing and analyzing 'tests'. Let's review some of the problems.

Problem #1: Statistical Significance

Anytime we want to execute a test, a statistician will want to analyze the test (remember, I have a statistics degree --- I want to analyze tests!).

In order to make sense of the conclusions, the statistician will introduce the concept of "statistical significance". In other words, the statistician will tell you if the difference between a 3.0% and 2.9% click-through rate is "meaningful". If, according to statistical equations, the difference is not deemed to be "meaningful", the statistician will tell you to ignore the difference, because the difference is not "statistically significant".

Statisticians want for you to be right 90% of the time, or 95% of the time, or 99% of the time.

We all agree that this is critical when measuring the effectiveness of a cure for AIDS. We should all agree that this isn't so important when measuring the effectiveness of putting the shopping cart in the upper right-hand corner of an e-mail campaign.

Business leaders are seldom given opportunities to capitalize on something that will work 95% of the time. Every day, business leaders make decisions based on instinct, on gut feel, not having any data to make a decision. Knowing that something will work 72% of the time is a blessing!

Even worse, statistical significance only holds if the conditions that existed at the time of the test are identical to the conditions that exist today. Business leaders know that this assumption can never be met.

Test often, and don't limit yourself to making decisions only when you're likely to be right 99% of the time. You'll find yourself never making meaningful decisions if you have to be right all the time.

Problem #2: Small Businesses

Large brands have testing advantages. A billion dollar business can afford to hold out 100,000 customers from a marketing activity. The billion dollar business gets to slice and dice this audience fifty different ways, feeling comfortable that the results will be consistent and reliable.

Small businesses are disadvantaged. If you have a housefile of 50,000 twelve-month customers, you cannot afford to hold out 10,000 from a catalog or e-mail campaign.

However, small business can afford to hold out 1,500 twelve-month customers out of 50,000. The small business will not be able to slice and dice the data the way a large brand can. The small business will have to make compromises.

For instance, look at the variability associated with ten customers, four of which spend money:
  • $0, $0, $0, $0, $0, $0, $50, $75, $150, $300.
    • Mean = $57.50.
    • Standard Deviation = $98.63.
    • Coefficient of Variation = $98.63 / $57.50 = 1.72.
Now look at the variability associated with measuring response (purchase = 1, no purchase = 0).
  • 0, 0, 0, 0, 0, 0, 1, 1, 1, 1
    • Mean = 0.40.
    • Standard Deviation = 0.516.
    • Coefficient of Variation = 0.516 / 0.40 = 1.29.
The small company can look at response, realizing that response is about twenty five percent "less variable" than the amount of money a customer spent.

Small companies need to analyze tests, sampling 2-4% of the housefile in a holdout group, focusing on response instead of spend. The small company realizes that statistical significance may not be achievable. The small company looks for "consistent" results across tests. The small company replicates the rapid test analysis document, using response instead of spend.

Problem #3: Timeliness

The internet changed our expectations for test results. Online, folks are testing strategies in real-time, adjusting landing page designs on Tuesday morning based on results from a test designed Monday morning, executed Monday afternoon.

In 1994, I executed a year-long test at Lands' End. I didn't share results with anybody for at least nine months. What a mistake. We had spirited discussions from month ten to month twelve that could have been avoided if communication started sooner.

Start analyzing the test right away. Share results with everybody who matters. Adjust your results as you obtain more information. It is ok that the results change from month two to month three to month twelve, as long as you tell leadership that results may change. Given the fact that the online marketers are making changes in real-time, you have to be more flexible.

Problem #4: Belief

You're going to obtain results that run contrary to popular belief.

You might find that your catalog drives less online business than matchback results suggest. You might find that advertising womens merchandise in an e-mail campaign causes customers to purchase cosmetics.

You might find that your leadership team dismisses your test results, because the results do not hold up to what leadership "knows" to be true.

Just remember that people once thought the world was flat, that the universe orbited Earth, and that subprime mortgages could be packaged with more stable financial instruments for the benefit of all. If unusual results can be replicated in subsequent tests, the results are not unusual.

Leadership folks aren't a bunch of rubes. They have been trained to think a certain way, based on the experiences they've accumulated over a lifetime. It will take time for those willing to learn to change their point of view. It does no good to beat them over the head with "facts".

Catalog Craig Paperman on the Amazon Toy Catalog

Yes, this is business fiction. If this isn't your thing, take a break and read this article about Build-A-Bear and their promotion t...