January 01, 2008

Rapid Test Results

In 2008, I'm going to focus energy discussing how test results and Multichannel Forensics increase profitability, and hopefully decrease customer dissatisfaction. Today, we begin the discussion by exploring the concept behind a project I call "Rapid Test Results".

One of the easiest ways for multichannel catalogers, retailers and e-mail marketers to understand customer behavior is through the use of "A/B" tests.

In an "A/B" test, one representative group of customers receive a marketing activity, while the other representative group of customers do not receive a marketing activity.

The catalog industry uses matchback algorithms to understand multichannel behavior. As most of us understand, matchback algorithms over-state the effectiveness of marketing activities.

Conversely, e-mail marketers understate the effectiveness of e-mail marketing activities when using open rates, click-through rates, and conversion rates.

Therefore, we need to improve the understanding of our marketing activities. One way to do this is to create and analyze more "A/B" tests, often called "mail/holdout" tests.

It can be very easy to execute these tests.

However, we don't always have the resources necessary to analyze and understand the test results.

If you are an executive who falls into the latter category, I have something for you. It is called "Rapid Test Results".

For my loyal blog readers, executives, and current customers, I have an inexpensive proposal just for you. The Rapid Tests Results Analysis Document outlines an inexpensive project that gets you results to the tests you executed, within just a few days of sending your information for analysis purposes.

If there's one thing I learned in 2007, it is that e-mail and catalog teams are minimally staffed! And yet, the information that can be gleaned from tests executed by e-mail and catalog marketing teams can shape the future direction of your organization.

So if any of the following criteria are met by your organization, please consider a Rapid Test Results Project:
  • You are an e-mail marketer who believes your e-mail campaigns drive more sales and profit than you can measure via standard metrics like open rate, click-through rate, and conversion rate.
  • You are a catalog marketer who wants to truly understand if multichannel customers respond to catalog marketing, and want to truly learn the impact of catalog marketing on the online channel.
  • You are a catalog marketer who wants to reduce catalog marketing expense (and benefit the environment) by limiting contacts to internet customers.
  • You do not have the analytical resources to analyze test results quickly.
  • You do not have the systems support to measure test results by different customer segments, across different channels, or across different merchandise classifications.
  • Your executive team does not understand the constraints and limitations that prevent your team from analyzing all of your tests in a timely manner.


  1. Anonymous1:29 PM

    This is really cool. I have following concerns with the test and the results:

    1. Order classification is a big problem in a case when you have multiple advertising vehicles such as catalog/email/Internet/Magazine etc. How does one account for interaction between these vehicles. For example, customer A, who is in the test group, read the catalog, and but eventually bought on the website due to some reason. How does one account for such interactions? (Even if one has a source code collection mechanism, it still will not tell whether the order was driven by catalog or by website.)
    2. If the bulk of the test group is not break-even customers and infrequent buyers, then how would one come to conclusion that they should be cannibalized based on a four week test for the rest of the year? In this scenario, it may be a good idea to extend the test for lets say six months to collect more observations over a period of time to understand the seasonality of purchase behavior of these customers. This may also smooth out some market-economy fluctuations such as some new product launches, etc.
    3. How would one account for numerous changes that go on website/email during this catalog no-mail test? In general AOV is smaller on website than catalog; also product margins are significantly different between website and catalog. How does one account for these changes?
    4. When analyzing the test results by various segments, one should make sure he or she is not interpreting the noise? In essence, one must make sure they have enough sample sizes to interpret the observed results in numerous segments to some margin of errors.
    5. How would you go about how much to cannibalize the catalog circulation based on the observations? (I think you mean by the ratios of the dollar per spend per customer between mail VS no-mail group.)

    Thank you.

  2. Hello Anonymous.

    (1) Order classification is not a problem, as you described, because the rate at which folks buy due to some reason is assumed to be the same in each group. The issue you bring up becomes a problem should the sample sizes in the mail and holdout group are not large enough.

    (2) I have no problem extending the test for six months, go right ahead and do that when you analyze your tests! Be sure, however, to give your CEO an early read, then fill her in on the results after six months.

    (3) You don't have to worry about the changes to the website and e-mail during the test, because the changes are identical for both the mailed group and holdout group. As a result, the changes "cancel each other out".

    (4) I agree 100% with you. Valid sample sizes are required to sub-divide mail/holdout groups.

    (5) If I were planning next year's catalog, using the example I provided, I would take my source code reporting for phone/mail, discount it by 25% to account for cannibalization, then increase that number by 33% to account for the volume driven online.

    Good questions!!!

  3. Anonymous3:28 PM

    Thank you for the answers.

    Point 1 could be eventually fall into circular reasoning. The basic assumption that the rate at which customers across the housefile buy due to a set of reasons (numerous advertising vehicles) is a stationary process is very adventurous. I wonder how to go about validating such assumptions.

    Point 3 is still problem if you want to conclude the test results in summarizing the truly incremental sales through catalog. The cancellation effect may be true if online/email purchase behavior is uniformly distributed over the test and control groups. This cancellation effect may be true, if one is conducting the test for one cycle. Even then the email frequency (number of time the customers in the test and control groups get contacted) and online advertising changes can substantially cannibalize the catalog response. In such cases, the test results do not represent the truly incremental sales for the catalog as they were already cannibalized.

    I still like your idea of measuring the interaction of advertising vehicles through no-mail test, it will, on an average, give whats happening on the surface.

    Going back to the problem statement, if one wants to measure how much of catalog sales are truly incremental, enough care as to be taken in acting upon the test results. If not, you are taking a step that potentially reduce your 12month file. Another problem I see is that the revenue generated by certain product categories which are mostly driven by catalog only, due to visibility or skewness in online-advertising, drops drastically with cannibalization. This could be disastrous in the long run.

    Thank you again.

  4. Ok, given your concerns, share with the audience what you'd do in lieu of the methodology outlined in the document. Go ahead and outline for the audience your strategy to overcome these issues ... if it's reasonable, I'll publish it as a response.

  5. Anonymous5:50 PM

    There is nothing wrong with your methodology. I do not know if I have a better methodology. No test is devoid of such concerns. These are the questions that I get when I think against myself. As you pointed out few times in your articles, it is a complex and dynamic ecosystem and one must be very careful with observations from such pure reductionist methodologies.

    As an example, we tested at different time periods of an year and for the last three years and saw significant variations in observations. As one digs deeper into the analysis and the test results, one would find out all the assumptions of a typical test design get violated. Results from such tests could be interpreted only in the context of the dynamic ecosystem. A long term impact study on the system has to be taken into account if one would have to take action from such test results.

    Thank you.

  6. Agree --- we did a year-long test at Lands' End back in the early 90s, the results from that test were reliable.

    When we got to Nordstrom, we used up to 10,000 in mail and holdout groups. Results were all over the board.

    If we were only testing in the direct channel (catalog & online), we had to have at least 50,000 in mail and holdout groups to get reliable results. If we wanted the results to carry-over to the retail channel, the sample sizes had to be 300,000 in mail and holdout groups.

    When you have 8,000,000 12 month buyers, you can get away with 300,000 in a holdout group --- and still reliably measure performance by merchandise division.

    For everybody else, they should read your comments and concerns, balancing the methodology with what you're saying.

    Great feedback, thanks!!!


Note: Only a member of this blog may post a comment.


Here's what I noticed. On March 11, 2024, we were all sent home for a few months due to COVID. Folks will say the world changed on that ...