Comments on Kevin Hillstrom: MineThatData: Testing Issues

Now you're doing better, thanks.

2008-01-03T18:27:00.000-08:00

Now you're doing better, thanks.

Thank you.In such cases, I usually play around wit...

2008-01-03T17:17:00.000-08:00

Thank you.

In such cases, I usually play around with some parameters, the probability of observing a x% of the change of effect I want to see or the significance level. For example, if the test was designed at 2% response rate (or at X1 dollar per book (DPB), Y1 std.err of DPB), with a 10% change of effect at 90% confidence level and the sample came out to be N1. I tinker with N1, confidence level, and observable % change of effect. I reduce N1, and recompute the probabilities to observing 50-60-70-80-99% of the 10% change of effect at numerous confidence levels. One can play around with difference (test VS control) distributions and sample size and confidence level to reduce original sample size considerably.

For a small housefile (in fact for any size of housefile), the economics would decide the whether to run a test or not. Another way is to split the housefile into some segments and see how much is there "there" to get some measurable observations. May be standard errors in average order value is pushing towards big sample sizes. Generally, the smaller the change we want to see and/or the lower our response rate, the larger the sample size we need. The sample size required for the segment with best customers will be mush smaller than N1 above. If flexibility in running a test is allowed, the economics of stratified samples can be worked out see the how much revenue would be lost/gained due to the testing.

It is very important to keep in mind that how much change in effect we can observe at a given confidence level. It is not a all-or-nothing problem. Theoretically one can interpret a test results with any amount sample sizes at a certain confidence level and a change in effect of response. But can we move in different dimensions (confidence level, % change in effect) and obtain the same conclusions from the observations.

Hello Anonymous. Imagine all the folks who are be...

2008-01-03T16:05:00.000-08:00

Hello Anonymous.

Imagine all the folks who are being misled by SPSS' Explore Procedure and Microsoft Excel's STDEV function, both of which provide the biased estimate you criticized.

Theoretically, you're 100% correct, my readers will appreciate the clarification you offered.

Now, what I need from you is a solution to a problem.

Take the catalog CEO of a small business. This CEO has a housefile of 20,000 customers she can mail catalogs to.

If she wants to follow your prescription, and use standard deviations to measure the change in response for test vs. control, she may get the answer that she needs to hold out 8,000 customers from an upcoming catalog mailing, in order to learn something at a level of accuracy that pleases a statistician.

In other words, she needs to lose 40% of her revenue in order to learn what she needs to learn in a way that satisfies a statistician.

The CEO will not go for this. Nor will she go for holding out 4,000 people, losing 20% of her revenue.

I bring this up because I run into this problem every day --- I calculate the sample size needed to measure $/book, and the math doesn't work out favorably.

So, I need a solution from you. Tell this audience of catalog executives how you would solve this problem for this CEO?

I think your standard deviation for the response d...

2008-01-03T14:21:00.000-08:00

I think your standard deviation for the response data is a baised estimate sqrt(sum(x_i-x_mean)^2/(n-1)), calculated to 0.516, as opposed to the unbiased estimate sqrt(sum(x_i-x_mean)^2/(n)) or sqrt(p*(1-p)), p is the response rate, which evaluates to 0.49.

I think the size of the housefile determining the sample size for the test is a heuristic. The sample size for the test should be based on the how much variation is the data (the underlying universe for the test), measured in standard deviations and how small change in effect do we want to see (e.g. the change in response rate for the test VS control).

Thank you.