It was 1992. Toronto won the World Series. Alabama was National Champion in College Football. Duke won the NCAA Mens Basketball Championship. Mark Wahlberg was a pop culture sensation. A Clinton was running for President. A generation later? Interesting.
Each Friday at 3:50pm, it was my duty to submit a job to our mainframe computer (#cloud - see, things really don't change much). I would ask that the customer file be scored with my statistical models. Once the file was scored, I would ask the mainframe computer to print the customer record for one out of every ten thousand customers. On Monday morning, I would have a stack of 400 pages of tractor paper, with one customer record printed on each side of the page. I'd go through each page, making sure that the customer metrics were right ... making sure that the multiplication of customer metrics by modeling coefficients was done correctly ... making sure that the end result was a correct score.
Do you do that today, in your world of "BIG DATA"? Do you audit anything? More on that in a moment.
On one Friday at 3:50pm, I was in a hurry. Had to get home. Maybe it was because it was Cheese Days in Monroe, and I wanted to get there early, who knows? So instead of typing the number "10,000" for the sampling rate (1 in 10,000), I typed in the number "100", submitted the job, got in my brand new 1992 Mercury Topaz (#lemon), and off I went.
Sixty-four hours later, I waltzed up to my cubicle in Building Five at Lands' End, and was greeted by seven boxes of printed customer records. Seems the folks in the Information Technology department take things literally - they actually printed one out of every one hundred customers, and they delighted in the process of dumping the records off at my cube. I can only imagine how much they enjoyed sticking it to me - they printed these records every three weeks for the seven years prior - they knew what was supposed to happen - they could have said something - they didn't say anything!!
I learned a valuable lesson - I needed to audit the document designed to help me audit the scoring of the customer file!
2015 has been a very odd year. I have received more bad data in 2015 than in the eight prior years of MineThatData combined. Columns don't align in datasets. Customers are missing. Demand is missing. Customer records are alphanumeric or character in files that must be joined, making it hard to join records. Annual sales totals are not audited. Customer counts are not audited. Item numbers are recycled, making it impossible to analyze merchandise trends over time. Channel information (#omnichannel) is not standardized. Email clicks are not recorded. Payment amounts for each keyword are not recorded. Customer records are not de-duped.
A generation ago, your data structure was "in-sourced". You had a team of 30 people, working together to make sure that everything was right.
Today, your marketing team is "out-sourced". Your pay-per-click vendor may or may not have audits in place to make sure paid search is executed properly. Your email vendor may or may not have audits in place to make sure your campaigns are executed properly. Your housefile database vendor (#cloud) may or may not have audits in place to make sure your database is maintained properly. Your housefile scoring vendor may or may not bother to make sure that the equations are being applied properly. Your cloud-based social data vendor may or may not make sure that @gumbo72 is truly engaged with your brand.
Because everything is outsourced, you don't have any control over what happens when bad data is generated in one part of your ecosystem. When your co-op misinterprets something and determines that a customer in Montana is pregnant, well, that data is fed throughout the co-op ecosystem, causing the mortified 63 year old in Bozeman to be pummeled with baby-centric marketing nonsense. But rest assured, this #datadriven action can be #optimized, making sure that the right ad gets placed in front of the right customer at the right time.
What a bunch of nonsense.
Say your co-op made an error that resulted in the 63 year old in Bozeman being pummeled with baby-centric marketing nonsense ... and say that the co-op had to receive a printout of every single error they made ... all 80,000 a week ... and say that the co-op modeler had those records dumped in her office on Monday morning ... do you think the dump truck load of results would cause the analyst to think twice about putting proper audits in place?
If you have to, get in your 1992 Mercury Topaz, drive to your #cloud based co-op vendor, and inspect their records by hand. Visit your pay-per-click vendor, and ask to see their data cleansing process, from A to Z. Visit your housefile database vendor, and analyze every single aspect of their data cleansing process. Make sure your trusted partners are doing their job properly.