## December 19, 2007

### Data Mining In Multichannel Cataloging

RFM was a brilliant step in the evolution of catalog circulation.

In so many companies, this simple methodology is still used today.

Then other techniques became popular.

In the 1980s, regression modeling became a worthy endeavor, thanks to the computing horsepower of mainframe computers.

In the 1990s, neural networks became popular, as desktop computing allowed folks to build these models. This methodology requires "trust", because one cannot simply understand "what" makes a neural network tick. In part, this "trust" factor limited the adoption of neural network methodologies in catalog circulation.

RFM, regression, or neural networks are designed to "predict" if a customer will respond to a catalog or e-mail campaign, given a set of historical variables.

In 1994, an overly simple regression equation might look like this:
• Predicted Spend = \$3.00 - \$0.05 * (Months Since Last Purchase) + 0.50 * (Lifetime Orders).
In this framework, each customer is evaluated, based on each customer's traits.
• Recency = 3 Months, Lifetime Orders = 1, Score = 3.00 - 0.05*3 + 0.50*1 = \$3.35.
• Recency = 24 Months, Lifetime Orders = 2, Score = 3.00 - 0.05*24 + 0.50*2 = \$2.80.
In 1994, this methodology worked really well (ok, I know the folks who worked at Lands' End and L.L. Bean would be impacted by the theories about cannibalization brought forth by Bill End, but let's skip that for now).

But now it is December 2007, and the entire world has been rocked by the introduction of e-commerce (and in the rest of this article, feel free to substitute the word "e-mail" for "catalog", as the concepts are identical).

You see, e-commerce crippled the analytical methods used by most catalogers. All of a sudden, customers were able to order from multiple channels, and not use the source codes or key codes that helped catalogers identify the source of more than eighty percent of orders.

Our vendor community listened to our concerns, offering us a "matchback" algorithm that identified every customer who received a catalog, then placed an order on the internet over the next "x" weeks. We're very thankful we were given this methodology.

Thankful, that is, unless your customer base starts ordering merchandise on their own, regardless whether a catalog was mailed to them or not. When this happens (i.e. "multichannel customers"), traditional analytical techniques are crippled.

There are several ways to deal with this challenge.

First, the cataloger must fully utilize mail and holdout groups. This is an essential part of multichannel cataloging that is sorely missing these days. The holdout group shows you all of the online orders that happen without catalog marketing.

Second, the statistical modeler has a series of choices.

Two regression models can be built, one for the mailed group, one for the holdout group. The incremental difference between model scores is the "incremental" benefit of the catalog. While theoretically reasonable, there are a myriad of reasons why this methodology can fail ... in particular, discrepancies between the slope of the recency variable between models.

Some folks build "interaction" models, which predict response, using interactions between "mailed and holdout groups" and key predictors. In other words, there might be a variable called "recency" (months since last purchase). There might be another variable called "mailed recency", which is equal to zero for all folks in the holdout group, and equal to "recency" for all folks in the mailed group. The difference in coefficients is the impact of mailing a catalog, or sending an e-mail.

This form of "interaction" modeling suffers when holdout groups are small, or when many statistical assumptions are violated.

In a perfect world, circulation decisions need to be made on the basis of the
incremental difference between a group mailed a catalog, and a like group that is "held out" from the mailing. If you have the luxury of hiring statistical modelers, this represents the holy grail of catalog modeling ... being able to nail this "incremental difference". Six years at Nordstrom, having some of the best modelers in the world at my disposal, proved that doing this right is truly "the holy grail" of catalog modeling.

Many catalogers use modeling services from catalog vendors. If this is you, challenge your vendor to develop "incremental models" that predict the incremental impact between mailed and holdout groups.

For the majority of catalogers using RFM, I wouldn't worry a whole lot about this topic. I'd spend time doing the mail/holdout groups (especially among "multichannel customers"), comparing the results to your matchback results.

And for e-mail marketers, this is a concept that might be foreign to your vendor community. Start the mail/holdout tests, and if the results are significantly different than your typical metrics, you know that "incremental" models make sense.