September 08, 2010

Digital Profiles Part A: Necessary Data

The first part of the Digital Profile process is data acquisition. Think about data in terms of a spreadsheet.
  • Each row in the spreadsheet represents a unique customer.
  • Each column in the spreadsheet represents customer behavior during a finite period of time.
Of course, I do the majority of my work in e-commerce, retail, and cataloging. For these businesses, an appropriate period of time is one year. You really don't want irrelevant data from four years ago clouding how customers behave today, this results in an uninteresting segmentation scheme that is ultimately reduced to "RFM" attributes.

Instead, you want a short period of time. Maybe your business is different, and if so, that's fine, just take the concepts I explore here and apply them to different timeframes.

In my example, I pulled data from the customer database I am analyzing ... using 2009.05.01 to 2009.04.30 as my one year time interval.

Next, I calculated a series of "KPIs", to use the parlance of the day, metrics that illustrate traditional customer behavior during this one-year period of time. Each metric is represented as a column in the spreadsheet.
  • Demand: Amount spent by the customer in the past twelve months.
  • Frequency: Number of orders in the past twelve months.
  • Items: Number of items purchased in the past twelve months.
These three metrics allow me to create another series of metrics, metrics that are actually quite important in the development of a Digital Profile:
  • Items per Order: Divide total items by frequency.
  • Price per Item Purchased: Divide demand by number of items.
At this point, we have six columns in our "spreadsheet" (household_id, demand, frequency, items, items per order, price per item purchased).

I've found, via trial and error, that the remaining variables are best categorized as "1" or "0" indicators.

The business I am analyzing has nine merchandise divisions (tabs that run across the top of the homepage on the website). I create nine columns in the spreadsheet. If a customer purchased from a merchandise division in the past year, I record a value of "1". If a customer did not purchase from a merchandise division in the past year, I record a value of "0".
  • Merchandise Division #1: 1 = Yes, 0 = No.
  • Merchandise Division #2: 1 = Yes, 0 = No.
  • Merchandise Division #3: 1 = Yes, 0 = No.
  • Merchandise Division #4: 1 = Yes, 0 = No.
  • Merchandise Division #5: 1 = Yes, 0 = No.
  • Merchandise Division #6: 1 = Yes, 0 = No.
  • Merchandise Division #7: 1 = Yes, 0 = No.
  • Merchandise Division #8: 1 = Yes, 0 = No.
  • Merchandise Division #9: 1 = Yes, 0 = No.
Ok, that takes care of merchandise. Now, I want to create a series of 1/0 variables for each physical/advertising channel utilized by the company. You can choose to use first-touch, last-touch, sophisticated allocation, whatever methodology you want ... I've found that the methodology doesn't make much difference.
  • Telephone Buyers: 1 = Yes, 0 = No.
  • Website Buyers: 1 = Yes, 0 = No (true catalogers would split this into two variables, orders that are purely online and orders matched-back to a catalog).
  • E-Mail Buyers: 1 = Yes, 0 = No.
  • Search Buyers: 1 = Yes, 0 = No.
  • Social Media Buyers: 1 = Yes, 0 = No.
  • Mobile Buyers: 1 = Yes, 0 = No.
  • Retail Store Buyers: 1 = Yes, 0 = No.
Each channel has been accounted for, each merchandise division has been accounted for, and each KPI (demand, frequency, items, items per order, price per item purchased) has been accounted for. If you don't like what I've come up with, create your own variables, the sky's the limit!

Finally, I want to create columns in the spreadsheet for zip-based variables. These are also 1/0 variables.
  • Customer Lives in Zip Not in a Store Trade Area: 1 = Yes, 0 = No.
  • Customer Lives in Zip Claimed by 1 Store as a Trade Area: 1 = Yes, 0 = No.
  • Customer Lives in Zip Claimed by 2 Stores as a Trade Area: 1= Yes, 0 = No.
  • Customer Lives in Zip Claimed by 3+ Stores as a Trade Area: 1= Yes, 0 = No.
  • Customer Lives In Zip Populated By Great Catalog Shoppers: 1= Yes, 0 = No.
  • Customer Lives In Zip Populated By Average Catalog Shoppers: 1 = Yes, 0 = No.
  • Customer Lives In Zip Populated By Poor Catalog Shoppers: 1 = Yes, 0 = No.
  • Customer Lives In Zip Populated By Great Online Shoppers: 1 = Yes, 0 = No.
  • Customer Lives In Zip Populated By Average Online Shoppers: 1 = Yes, 0 = No.
  • Customer Lives In Zip Populated By Poor Online Shoppers: 1 = Yes, 0 = No.
Finally, I add two variables to the spreadsheet.
  • First Time Buyer In Past Year: 1 = Yes, 0 = No.
  • Multi-Channel Buyer (Phone, All Online Channels, Store): 1 = At Least 2 Of Those Channels, 0 = 1 Or Fewer Of Those Channels.
That's my dataset! The dataset is a spreadsheet with one row per customer, each column representing the attributes mentioned above.

If you are a product manager at Unica, Webtrends, Coremetrics or Omniture, pay attention to this series, as Digital Profiles represent an outstanding way to unlock customer behavioral secrets embedded in your applications.

Up Next: A discussion of the methodology used to create each of the sixteen Digital Profiles.

1 comment:

  1. SQIAR (http://www.sqiar.com/solutions/technology/tableau) is a leading global consultancy which provides innovative business intelligence services to small and medium size (SMEs) businesses. Our agile approach provides organizations with breakthrough insights and powerful data visualizations to rapidly analyse multiple aspects of their business in perspectives that matter most.

    ReplyDelete