January 05, 2011

Analytics Thursday: Guesstimates

For awhile here, we're going to talk about old-school issues that have relevance in our modern digital world.  The series will be called "Analytics Thursday".

Old School:  Today, we go back to 1988.  I was a Statistical Analyst at the Garst Seed Company, in picturesque Slater, Iowa.  My job was to analyze corn and sorghum hybrid experiments.  Sometimes, our experiments yielded odd results.  Once, a deer "romped" through our experiment, ruining the results of a test.  Let's assume that we were analyzing four hybrids, and there were two sections in the field where we were testing.  Here are the results of the test:
  • East Side of Field, Hybrid #1 = 120 bushels.
  • East Side of Field, Hybrid #2 = 110 bushels.
  • East Side of Field, Hybrid #3 = Ruined by Deer.
  • East Side of Field, Hybrid #4 = 100 bushels.
  • West Side of Field, Hybrid #1 = 140 bushels.
  • West Side of Field, Hybrid #2 = Ruined by Deer.
  • West Side of Field, Hybrid #3 = 130 bushels.
  • West Side of Field, Hybrid #4 = 110 bushels.
My job was to determine the hybrid that had the best yield, in bushels.  We approached this by using a regression methodology ... three dummy variables for Hybrid #1, Hybrid #2, and Hybrid #3 (all compared against our base, which is Hybrid #4), and one dummy variable for the East Side of the Field (compared to the West Side of the Field).  We predict Yield (in bushels) as a function of our dummy variables:
  • Equation = 112.5 + 25.0*(Hybrid_1) + 12.5*(Hybrid_2) + 17.5*(Hybrid_3) + 0.0*(Hybrid_4) - 15.0*(East_Side_of_Field).
With this equation, I can predict the yield in instances where our test was ruined by deer.
  • East Side of Field, Hybrid #1 = 120 bushels.
  • East Side of Field, Hybrid #2 = 110 bushels.
  • East Side of Field, Hybrid #3 = 115 predicted bushels.
  • East Side of Field, Hybrid #4 = 100 bushels.
  • West Side of Field, Hybrid #1 = 140 bushels.
  • West Side of Field, Hybrid #2 = 125 predicted bushels.
  • West Side of Field, Hybrid #3 = 130 bushels.
  • West Side of Field, Hybrid #4 = 110 bushels.
Now that we've corrected for each "bad data point", we can create averages for each hybrid (a methodology called "least square means").
  • Hybrid #1 = 130.0 bushels.
  • Hybrid #2 = 117.5 bushels.
  • Hybrid #3 = 122.5 bushels.
  • Hybrid #4 = 105.0 bushels.
We filled in the holes in, providing guesstimates where deer ruined the test.




Modern Application:  If you are a Savvy Web Analyst, you can use this methodology to predict what might happen if you don't have a promotion on a certain day.  East/West portions of the field are like having a free shipping promotion, or not having a free shipping promotion.  Each hybrid is a day in your fiscal year.  Use "Conversion Rate" instead of "Year".  You'll have days where you ran goofy promotions, those are the days that are like deer ruining our hybrid experiments.  This methodology allows you to make an accurate guesstimate as to what might have happened if you ran normal promotions, or no promotion at all.

2 comments:

  1. Anonymous10:54 AM

    Kevin, you rock. It's going to take a while to get all this down. Is there any way to run this type of analysis on http://en.wikipedia.org/wiki/Messier_object ?

    ReplyDelete
  2. There may be some way to do that, but it is far outside of my area of expertise!

    ReplyDelete

Note: Only a member of this blog may post a comment.

No Context

Read this article and you'll be struck with a notable finding ( click here ). There is no context here. "Pureplay decreased by 51%&...