February 17, 2015

Half-Life of Web Visitation Data in a Catalog Setting

I want to clear up some misinformation that is circulating out there in the vendor community about my use of web visitation data in determining who should receive a catalog. 

By the end of this post, every vendor in our industry will be able to run the analysis themselves, will be able to make profit for their clients, and will be able to generate profit for themselves. With a Statistics degree and twenty-seven (27) years of modeling experience in a professional marketing setting, I will fully explain the methodology to all vendors and clients, so that the information can be used to generate profit for all.

Sound good?

There is a very interesting dynamic that occurs when you analyze web visitation data. Here is a simple segmentation of monthly direct channel demand, by recency (days since last purchase, past two years), and whether the customer has visited the website in the past two years.


The column "mean" represents how much the customer spent in the next thirty days. What do you see? Well, if a customer ever had a website visit (1.00), the customer is worth twice as much as a customer without a historical website visit. 

That's a good thing!

At this point, one might conclude that you have to, absolutely have to, use web visitation data in catalog circulation models.

But if you stop your analysis here, you miss out on all of the delicious goodness that comes from understanding how web visitation data truly impacts future purchases after controlling for historical recency and historical spend.

Let's run a logistic regression on next month's online demand (1 = respond, 0 = no purchase) ... we'll use demand spent (demand / 100), summed 0-30 days ago, 31-90 days ago, 91-180 days ago, 181-365 days ago, and 366-730 days ago. Then, we'll add variables for number of website visits 0-7 days ago, 8-30 days ago, 31-90 days ago, 91-180 days ago, 181-365 days ago, and 366-730 days ago. Here's the first run, a forward entry logistic regression.


Look at Step 11. Look at the VISIT variables. What do you see?
  • Visits 0-7 days ago are important (coefficient = 0.252).
  • Visits 8-30 days ago are marginally important (coefficient = 0.080).
  • Visits 31-90 days ago are essentially irrelevant (coefficient = 0.004).
  • Visits 91-180 days ago have a negative coefficient (meaning they hurt future response).
  • Visits 181-365 days ago have a negative coefficient (meaning they hurt future response).
  • Visits 366-730 days ago have a negative coefficient (meaning they hurt future response).
Oh oh.

In other words, in the first table, where it looked like website visits doubled the future value of the customer, we were incorrectly looking at the information. It turns out that historical website visits are highly correlated with historical demand spent. That makes common sense, right? A customer who spent a lot in the past visited the website a lot in the past, right? The variables are correlated - they essentially represent the same thing. And in a proper modeling environment, where we control for historical demand and historical recency, website visits beyond the past 30 days have no impact whatsoever.

If I remove the negative visit variables (which I probably shouldn't do, but I will do for the sake of clarity), the model looks like this:


Now we can evaluate the value of web visit data.
  • Visits 0-7 Days Ago = Coefficient of 0.235.
  • Visits 8-30 Days Ago = Coefficient of 0.065.
  • Visits 31+ Days Ago = Coefficient of 0.000.
  • Value of Visits 8-30 Days Ago = 0.065 / 0.235 = 28% of the Value of Visits 0-7 Days Ago.
  • Value of Visits 31+ Days Ago = 0.000 / 0.235 = 0% of the Value of Visits 0-7 Days Ago.
Old web visitation data offers no value whatsoever, after controlling for recency / historical demand (demand / 100).

Web visitation data from 8-30 days ago has 28% of the value of web visitation data from 0-7 days ago.

Web visitation data from 0-7 days ago is important, no doubt about it. Looking at the coefficient, just one web visit from 0-7 days ago is worth about $170 spent 30-90 days ago.

This is why I clearly stated, a few weeks ago, that large catalogers use web visitation data in the past seven days as an indicator to kick out a hotline catalog ... they do this to capitalize on the very, very short half-life of web visitation data in a catalog environment.

Ok, somebody might say that visitation data has minimal impact on response, but it probably has an impact on spend if a customer responds. Here's the ordinary least squares regression equation to measure spend.


This is a stepwise regression equation (demand variables = demand / 100). Notice that in each step, demand variables are entered, and are entered significantly. In step six, the only visit variable enters the equation ... it's visits 0-7 days ago, and the coefficient is negative ... meaning that each website visit in the past 7 days cost this company $9.95 of average order value. Nice.

In this case, I'd revert back to step 5, remove the negative impact of visit data, and conclude that visitation data does not impact spend if a customer repurchases next month.

We now know the story of web visitation data.
  1. Each web visit in the past 7 days is worth about $170 of demand spent 30-90 days ago ... meaning that each web visit in the past 7 days is very important. This is a good thing!
  2. Each web visit 8-30 days ago is worth about $48 of demand spent 30-90 days ago ... meaning that each web visit 8-30 days ago has value. This is still a good thing, but the value of website visits are quickly degrading.
  3. Each web visit beyond 30 days contributes nothing toward future response. Nothing. These visits have no value, after controlling for recency and after controlling for historical demand.
Remember the table at the top of the post?


This table makes web visitation data look terribly important. The problem is that web visitation data is highly correlated with past purchases ... and this make sense ... you have to visit the website to purchase, so the two variables are measuring essentially the same thing. When you control for historical recency / demand, we see that web visitation data has an amazingly short half-life, with visits > 30 days ago having no bearing on future response.

Of course, your mileage will vary. But in this analysis, on real data, the conclusions are self-evident. When I run this analysis, the vast majority of the time, I get an outcome similar to this outcome. That's why I mentioned the outcome in a prior blog post. It is always a possibility that you will learn something different. That's why, in the prior blog post, I asked the readers to run the models for themselves, to determine half-life for themselves. Do the work yourself - your mileage will vary!

Again, catalog vendors, I just showed you how to do the work, for free. You may now apply the methodology to clients and prospects. How many of your competitors help you out like that?

If I were one of the co-ops, I'd jump all over this - it would be a great way to assist catalogers who are struggling with co-op response rates - just merge web visitation data with purchase data, and you're set.

If you have additional questions about the methodology, or wish for a deeper tutorial, please send me an email message (kevinh@minethatdata.com), and I will be happy to help you.