December 19, 2021

A Parsimonious Model

Several years ago, an employee from a vendor that many of you use (and this vendor has many readers of this blog) said something about me:
  • "He's an idiot. His models are just too simplistic to matter. Don't work with him."

Those of you who earned a statistics degree understand the requirement for models to be parsimonious (click here). The TL:DR version is that the modeler should use as few variables as possible to explain the data in the dataset. If you have a choice between a model with 4 variables and one with 1,400 variables, you should strive to use 4 variables.

In marketing, model builders model off of a dataset and then rank order either the same dataset or a comparable sample of data from best-to-worst performing customer. The modeler then sums total demand/sales by decile or one-percentile or whatever. The modeler looks to see if the addition of variables results in a better ranking of customers from best to worst.

In most cases, a very small number of variables, often under a dozen, explain the data just as well as a model with 200 variables or 600 variables or 1,400 variables. In those cases, you go with a dozen or fewer variables.

Now go back to the statement above.
  • "He's an idiot. His models are just too simplistic to matter. Don't work with him."

Does the statement align with the concept of parsimony?

Nope.

In other words, the critic is revealing his lack of knowledge of statistics.

One of the all-time worst meetings I've ever been in was at a major retail brand. A vendor (one many of you work with) built a model with more than a thousand variables for the retailer. The in-house statistician correctly pointed out to the vendor the folly of building a model with more than a thousand variables. 

Of course, the in-house statistician butchered the dependent variable ... he used total company sales instead of incremental sales generated by the marketing effort as measured via A/B tests. His model resulted in some of the worst targeting tactics I've seen ... regardless, he was arguing with the vendor rep who had a model with more than a thousand variables.

The two of them consumed 90 minutes of Executive Time arguing meaningless points. Both sides were pigheaded, unwilling to listen, unwilling to accept criticism, unwilling to change. 

Both sides were very willing to argue loudly.

If you ever want to test your vendor to see if they know what they are doing, ask them how many variables are in the model they built for you. If they tell you that they've got 449 variables in the model, go find a vendor who will build something more parsimonious for you.

How did the meeting end?

The CFO finally cleared the room and asked me to give him my unvarnished opinion of the two parties. I told the CFO that the in-house modeler didn't understand marketing incrementality thereby costing the company a fortune ... and I told the CFO that the vendor didn't understand statistical models. I told the CFO to not trust either party.

That answer didn't go over well ... at that point I became a problem for saying that.

The CFO getting grumpy with me for sharing the "unvarnished truth" comes with the territory of being a consultant.

Parsimony and Incrementality are really important. You don't have to understand either in a lot of detail if you outsource your modeling efforts. You just need to be able to ask the modeler valid questions and then be ready to properly direct their efforts when they violate either concept.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Upsets

On Saturday night, long after most of you went to bed, New Mexico scored what would become a game-winning touchdown with twenty-one seconds ...