February 01, 2015

Half-Life of Browsing Data

Go run this analysis, right now:

  1. Variable = Sum Daily Visits, Next 30 Days (i.e. January).
  2. Variable = Sum Daily Visits, Past Week (last week of December).
  3. Variable = Sum Daily Visits, 8-30 Days Ago (first three weeks of December).
  4. Variable = Sum Daily Visits, 31-90 Days Ago (September - November).
  5. Variable = Sum Daily Visits, 91-365 Days Ago (January - August).
Once you create the variables, then run a regression model ... January is your dependent variable, the rest are your independent variables.

Look at the coefficients of the variables. The coefficients represent the half-life of visit data. For instance, your coefficients might look like this:
  • Last Week of December = 2.493.
  • First Three Weeks of December = 0.577.
  • September - November = 0.106.
  • January - August = 0.006.
Divide each coefficient by 2.493. This tells us the half-life of browsing data.
  • Last Week of December = 1.000.
  • First Three Weeks of December = 0.231.
  • September - November = 0.043.
  • January - August = 0.002.
In our example, browsing data eight to thirty days old is only worth 23% of the weight of browsing data from the past week.

In our example, browsing data thirty-one to ninety days old is only worth 4% of the weight of browsing data from the past week.

In our example, browsing data ninety-one to three-hundred-sixty-five days ago is only worth 0.2% of the weight of browsing data from the past week.

In other words, you'd only care about data from the past week, which is more than four times as important as data from eight-to-thirty days ago. Older data is essentially irrelevant.

You see this all the time in catalog models - your browsing data only has value on two fronts ... first, older browsing data is a negative indicator with catalog responsiveness ... and second, data in the past week is generally a positive indicator of catalog responsiveness ... but those visits quickly die off ... and in fact, they die off so fast that it is hard to act upon the data outside of a hotline program.

Larger catalogers use various website visitation attributes to fire off hotline catalogs to inactive customers - this is not necessarily a real-time situation, of course, but the data has to be updated daily to have any impact on catalog segmentation.

Half-life browsing data is very important in email marketing - many large catalogers have 5-10 versions of emails multiple times a week - they use yesterday's browsing activity to shift the segment the customer belongs to, thereby assigning the customer to one of the 5-10 versions being delivered in the next campaign.

Browsing data is only important if you accurately calculate the half-life of the data. Otherwise, you've got problems.