Here is a link to the MineThatData E-Mail Analytics And Data Mining Challenge dataset: The dataset is in .csv format, and is about the size of a typical mp3 file. I recommend saving the file to disk, then open the file (read only) in the software tool of your choice.
This dataset contains 64,000 customers who last purchased within twelve months. The customers were involved in an e-mail test.
- 1/3 were randomly chosen to receive an e-mail campaign featuring Mens merchandise.
- 1/3 were randomly chosen to receive an e-mail campaign featuring Womens merchandise.
- 1/3 were randomly chosen to not receive an e-mail campaign.
Historical customer attributes at your disposal include:
- Recency: Months since last purchase.
- History_Segment: Categorization of dollars spent in the past year.
- History: Actual dollar value spent in the past year.
- Mens: 1/0 indicator, 1 = customer purchased Mens merchandise in the past year.
- Womens: 1/0 indicator, 1 = customer purchased Womens merchandise in the past year.
- Zip_Code: Classifies zip code as Urban, Suburban, or Rural.
- Newbie: 1/0 indicator, 1 = New customer in the past twelve months.
- Channel: Describes the channels the customer purchased from in the past year.
- Mens E-Mail
- Womens E-Mail
- No E-Mail
- Visit: 1/0 indicator, 1 = Customer visited website in the following two weeks.
- Conversion: 1/0 indicator, 1 = Customer purchased merchandise in the following two weeks.
- Spend: Actual dollars spent in the following two weeks.
By April 30, you are encouraged to write a paper that answers the following questions. Winning submissions will receive a copy of my book, Hillstrom's Multichannel Forensics, currently available at ForBetterBooks and Amazon.com. There's nothing wrong with winning a book valued at $95, is there??
I will give away at least one book, and as many as three books, depending upon entries within the following categories:
- The E-Mail Blogosphere: If we get enough entries, I will give away one book to the e-mail blogger who provides the most insightful answer.
- The Direct Marketing Industry: The best answer among direct marketing and e-mail marketing professionals and e-mail marketing vendors will receive a book. In addition, I'll publish well-written and insightful answers received from any qualified e-mail marketing vendor. In other words, you'll earn an opportunity to advertise for free to the MineThatData community, a community of more than 1,200 subscribers and daily visitors.
- The Data Mining Community: Data Mining professionals and University students are encouraged to send in entries, with the best-written and most insightful response receiving a free book.
- Which e-mail campaign performed the best, the Mens version, or the Womens version?
- How much incremental sales per customer did the Mens version of the e-mail campaign drive? How much incremental sales per customer did the Womens version of the e-mail campaign drive?
- If you could only send an e-mail campaign to the best 10,000 customers, which customers would receive the e-mail campaign? Why?
- If you had to eliminate 10,000 customers from receiving an e-mail campaign, which customers would you suppress from the campaign? Why?
- Did the Mens version of the e-mail campaign perform different than the Womens version of the e-mail campaign, across various customer segments?
- Did the campaigns perform different when measured across different metrics, like Visitors, Conversion, and Total Spend?
- Did you observe any anomalies, or odd findings?
- Which audience would you target the Mens version to, and the Womens version to, given the results of the test? What data do you have to support your recommendation?