March 24, 2014

Why Models Don't Work Any Better Today Than In 1994

Most modelers over-emphasizes "variables". I sat in a meeting last year where the vendor told my client that their strategy was "best" because they entered a thousand variables ... a thousand ... into their model. Statisticians, of course, are taught the concept of parsimony - a small number of variables that explain most of the variance is preferable to a large number of variables explaining a comparable amount of variance. 

If you have a model with 10 variables, and a model with 100 variables, you essentially end up with the same ranking of customers, best to worst. Oh, there are differences, subtle differences. Let's say you had 10 variables, and a customer ranked #700,000 out of 1,000,000. Going from 10 variables to 100 variables might change the ranking of this one customer from #700,000 to #662,000, or to #738,000.

So what? What difference does that make?

It makes no difference.

I know, I know, there's a couple hundred statisticians or statistically trained individuals at leading vendors who are reading this right now, and are cursing my name.

Well, if these folks were right, and you've been using their models for twenty years (+/-), then why doesn't your business grown by leaps and bounds when you implement a new model? Did you ever notice that almost nothing changes when you implement a new model (from a demand standpoint)? Yup, almost nothing changes.

The secret to mailing smarter isn't having a thousand variables in a model.

The secret to mailing smarter is accounting for three key factors.
  1. An accurate ranking of customers from best to worst, without over-fitting a model with 100 or 1,000 variables.
  2. Capturing cannibalization between catalogs - when you mail a catalog in March, you cut off the tail of a catalog mailed in February - if you do not mail a catalog in March, you increase demand from the February catalog. Give Clario props ... they account for this, don't they?
  3. Identifying demand that would happen anyway if catalogs are not mailed - what I call the "organic percentage". This is where I make all of my clients a ton of profit!
1/2/3 above are really important. Because those concepts are ignored, the models you use make minimal difference.

