### Holdout Cell Sizes - You Are Likely To Disagree With Me - Again!

Can I let you in on a little secret?

I once worked with a company that executed a lot of mail/holdout tests in direct mail / catalogs. A funny thing happened at this company. First, a ton of Executives didn't like testing, so they told the statistical guru that he couldn't choose more than 10,000 customers for a holdout group. Second, there was a lot of variance in the mail/holdout groups ... customers could spend \$25 or \$0 or \$1,250.

The statistical guru was caught between opposing sides of a cliff wall. The statistical guru should have stopped executing tests altogether given the bind he was in.

He moved forward anyway.

Because of the small holdout segment, the statistical guru could not prove that the mailed segment outperformed the holdout segment. The guru concluded that there was no need to continue to mail the catalog or direct mail piece, since he couldn't statistically prove that it generated sales.

Had he been allowed to have 50,000 customers in his holdout segment, he would have proved that the mailing generated several million dollars of sales and a half-million in profit.

But he concluded that the mailing should be shut down.

This is what can happen if you let the data-driven crowd drive the car.

At Nordstrom, and this is way back in 2003, I instituted a rule ... we would make sure that our holdout groups would not be smaller than 50,000 customers ... and would be 10% of the mailed quantity where applicable. In other words, if we were mailing 2,000,000 customers, the holdout group would be 200,000.

Yes, 200,000.

Oh, I know, I can hear the vendor thought leadership (#vtl) crowd, screaming all the way from Chicago or New England, thousands of miles away. "He's an idiot!". "Don't listen to him." "You don't need a holdout group, our machine learning algorithms can figure out everything for you, please pay us \$100,000."

Take a moment, take a breath, and continue reading.

Because I had a large holdout group, I could learn more about the business than anybody competing with me. You like being smarter than everybody else in the industry, don't you?

I could measure item level performance in many cases. I knew if a top-selling item, AN ITEM, belonged in the catalog or not. If the top-selling item generated 1,000 units in the mailed group and 1,000 units in the holdout group, it was a top-selling item that had no place in the catalog. Yes, this happens. All the time!

I could always measure category-level performance. I knew that 24 pages of Mens generated no incremental sales, that the Womens pages drove customers online/in-store to buy Mens. With this knowledge, we pulled Mens from subsequent mailings. Tell me how your machine learning algorithm could figure that out without appropriate tests in place?

I could measure sub-segments. I knew the incremental gain online, and in stores, for recent buyers, frequent buyers, lapsed buyers, customers who lived 77 miles from a store, customers who lived 7 miles from a store, customers who lived in Denver, customers who lived in Dover. Now, the sample sizes started to get small ... so when you have small sample sizes, you measure response (0, 1, 0, 0, 1) instead of measuring sales (0, \$184, 0, 0, \$396), greatly reducing variance. Got that?

Channel differences. You learn that catalogs have minimal/no impact on pure retail customers. You figure out other ways to build a relationship with retail shoppers. You learn that catalogs have marginal impact online. You learn that catalogs mean everything to rural customers. You learn what the organic percentage actually is. Do you know what your organic percentage is, by channel?

I had co-workers, Executives, who wanted my hide. One plucky Executive visited the Nordstrom Family, and told them that I was "breaking the rules". She asked the family to bring the hammer down on me.

We executed holdout groups for email marketing as well. Today, if I speak with 100 email marketers, 96 do not execute email marketing holdout groups. In other words, 96 out of 100 marketers truly have no idea what the incremental value of email marketing is. How could you ever know how much traffic email drove into a retail store without ample holdout groups? Be honest!

Catalogers do not like holdout tests. Catalogers like to mail stuff.

Email marketers almost universally detest holdout samples - heck, I've had email vendors tell my clients to stop working with me. I know I am on to something interesting and useful when vendors tell my clients that my clients should not work with me.

Executives do not like holdout tests ("how could you cost our business top-line sales, you idiot?").

Attribution experts should demand a healthy supply of holdout tests ... but for some reason, they don't always advocate the tests, trusting flawed models instead.

But you, you are smart, or you wouldn't be reading this. You know you need to be executing holdout tests. And you know you need to have large holdout cell sizes.

Without holdout tests, you're making countless ad investment mistakes.

Without large enough holdout panel cell sizes, you're also making ad investment mistakes.

Without large enough holdout panel cell sizes, you're not learning anything. Don't you want to learn? Don't you want to be smart? Don't you want to be an expert in your field? Don't you want to be eight steps ahead of professionals you compete against?

With appropriate holdout cell sizes, you have the knowledge necessary to push your business forward.

Ok, your turn. If you disagree, use the comments section to defend your case. Show math that defends your case.