October 04, 2015

The Data Is Wrong!!

Did you see this article from Friday (click here)?

First, look at the ad next to an article about bad weather data:




Alanis Morissette would agree it is ironic that an article talking about how terrible initial data conditions lead to terrible forecasts is supported by an ad about a conference featuring technology that uses incomplete data to put ads in front of customers that are never clicked on - a technology that you use to bid to pay higher-than-necessary rates to put an ad in front of a customer that seven in ten thousand (i.e. nobody) will respond to.

When your co-op sales rep tells you that they have a new "coherence model" (or whatever name they call the next version of models that are cousins to the models used in 1995) and that they're seeing breakthrough results, you have to question the statement.
  • Is the data being used to create the model "right" in the first place? Ask your co-op sales rep to see the data being used as an initial starting point.
  • Are the models that sit on top of the data "right" in their assumptions?
  • Ask the co-op rep to share the model coefficients with you - it's your model, you are paying for it, so you should get to see what you are paying for, right?
Ask your co-op sales rep to answer each question. If you don't get a valid answer to any of the three questions (and you probably won't), then ask yourself why these folks are still trusted partners? Why would your trusted partner hide the truth from you? (and co-op folks, this works in reverse - if your clients won't share profit/loss data on the names you give them, are these your trusted partners?).

Let me tell you a story. I was asked to visit a company and referee a discussion between vendor statisticians and in-house statisticians. What a mess! The vendor used the same data that the in-house statisticians used. But each party transformed the data in different ways, so their starting points were fundamentally different. Different starting points mean different outcomes (like in the example with hurricane forecasts). And different starting points mean different modeling approaches. Each party utilized fundamentally opposite tactics. The vendor had hundreds of variables in the model, optimizing profit. The in-house statistical team had a half-dozen variables optimizing sales. Again, both tactics are wrong (never use 100s of variables in a model, the last several hundred have no impact - the concept is called 'parsimony' ... and modeling sales instead of profit means that you reward customers who return a lot of merchandise and/or purchase via discounts/promotions), and because both tactics are wrong, both tactics yield bad outcomes. Of course, both tactics are doomed because they start with bad data (i.e. bad variables).

But man, what an argument, where two parties who are wrong yell at each other about how wrong the other side was! It almost sounded like our political process.

If your in-house statistician won't share the details of the model building process with you, find a new in-house statistician.

If your vendor co-op or retargeter won't share the details of the model building process with you, find a new vendor co-op or retargeter.

If your attribution vendor won't share the intimate details of the modeling process used to dictate your advertising investment algorithm, then fire the attribution vendor and find a new one who is honest and transparent.

In weather, we can clearly see when one modeling process, #datadriven and all that, is still woefully wrong.

In marketing, do we know if our trusted analytics partners are right or wrong?