March 20, 2008

The MineThatData E-Mail Analytics And Data Mining Challenge

It is time to find a few smart individuals in the world of e-mail analytics and data mining! And honestly, what follows is a dataset that you can manipulate using Excel pivot tables, so you don't have to be a data mining wizard, just be clever!

Here is a link to the MineThatData E-Mail Analytics And Data Mining Challenge dataset: The dataset is in .csv format, and is about the size of a typical mp3 file. I recommend saving the file to disk, then open the file (read only) in the software tool of your choice.

This dataset contains 64,000 customers who last purchased within twelve months. The customers were involved in an e-mail test.
  • 1/3 were randomly chosen to receive an e-mail campaign featuring Mens merchandise.
  • 1/3 were randomly chosen to receive an e-mail campaign featuring Womens merchandise.
  • 1/3 were randomly chosen to not receive an e-mail campaign.
During a period of two weeks following the e-mail campaign, results were tracked. Your job is to tell the world if the Mens or Womens e-mail campaign was successful.

Historical customer attributes at your disposal include:
  • Recency: Months since last purchase.
  • History_Segment: Categorization of dollars spent in the past year.
  • History: Actual dollar value spent in the past year.
  • Mens: 1/0 indicator, 1 = customer purchased Mens merchandise in the past year.
  • Womens: 1/0 indicator, 1 = customer purchased Womens merchandise in the past year.
  • Zip_Code: Classifies zip code as Urban, Suburban, or Rural.
  • Newbie: 1/0 indicator, 1 = New customer in the past twelve months.
  • Channel: Describes the channels the customer purchased from in the past year.
Another variable describes the e-mail campaign the customer received:
  • Segment
    • Mens E-Mail
    • Womens E-Mail
    • No E-Mail
Finally, we have a series of variables describing activity in the two weeks following delivery of the e-mail campaign:
  • Visit: 1/0 indicator, 1 = Customer visited website in the following two weeks.
  • Conversion: 1/0 indicator, 1 = Customer purchased merchandise in the following two weeks.
  • Spend: Actual dollars spent in the following two weeks.
Ok, that represents the basics.

By April 30, you are encouraged to write a paper that answers the following questions. Winning submissions will receive a copy of my book, Hillstrom's Multichannel Forensics, currently available at ForBetterBooks and Amazon.com. There's nothing wrong with winning a book valued at $95, is there??

I will give away at least one book, and as many as three books, depending upon entries within the following categories:
  • The E-Mail Blogosphere: If we get enough entries, I will give away one book to the e-mail blogger who provides the most insightful answer.
  • The Direct Marketing Industry: The best answer among direct marketing and e-mail marketing professionals and e-mail marketing vendors will receive a book. In addition, I'll publish well-written and insightful answers received from any qualified e-mail marketing vendor. In other words, you'll earn an opportunity to advertise for free to the MineThatData community, a community of more than 1,200 subscribers and daily visitors.
  • The Data Mining Community: Data Mining professionals and University students are encouraged to send in entries, with the best-written and most insightful response receiving a free book.
Here are the questions you are encouraged to answer.
  • Which e-mail campaign performed the best, the Mens version, or the Womens version?
  • How much incremental sales per customer did the Mens version of the e-mail campaign drive? How much incremental sales per customer did the Womens version of the e-mail campaign drive?
  • If you could only send an e-mail campaign to the best 10,000 customers, which customers would receive the e-mail campaign? Why?
  • If you had to eliminate 10,000 customers from receiving an e-mail campaign, which customers would you suppress from the campaign? Why?
  • Did the Mens version of the e-mail campaign perform different than the Womens version of the e-mail campaign, across various customer segments?
  • Did the campaigns perform different when measured across different metrics, like Visitors, Conversion, and Total Spend?
  • Did you observe any anomalies, or odd findings?
  • Which audience would you target the Mens version to, and the Womens version to, given the results of the test? What data do you have to support your recommendation?
E-mail your responses to me by 11:59pm on Wednesday, April 30, 2008. Good luck, and have fun analyzing the information! Dazzle our readers with your insights --- feel free to share your findings in the comments section of this post.

4 comments:

  1. Big thanks from snowy Russia for this data set!

    ReplyDelete
  2. I was planning on doing this exercise and was curious to see who my answers fared with the winner. Where is the winning paper posted?

    ReplyDelete
  3. http://blog.minethatdata.com/2008/05/best-answer-e-mail-analytics-challenge.html

    ReplyDelete

Note: Only a member of this blog may post a comment.

Items That Appear In Multi-Item Orders

In a typical Life Stage Analysis within a Merchandise Dynamics project, it is common to see exaggerated trends when comparing first-time buy...