February 13, 2009

Business Intelligence Quiz

Here's a problem your CMO wants you to solve.
  • Find all customers who have an e-mail address, and have purchased at least one time from one of your stores in 2007. Calculate the probability that this customer visited your website during 2008, calculate the average number of website visits, and the average number of purchases during 2008. If the probability of this customer segment visiting the website is greater than 70%, pull all e-mail addresses for an e-mail campaign.
Use the comments section to describe the software tools you would use to solve this problem, and why you would use the tools you recommend to our readers. List the pros and cons of the various tools you recommend. Here is a list of possible tools:
  • Coremetrics
  • Omniture
  • Google Analytics
  • SPSS
  • SAS
  • Microsoft BI Suite / SQL
  • Oracle or SAP or Siebel or other large enterprise tools.
  • Business Objects
  • MicroStrategy
  • Microsoft Access and Microsoft Excel
  • Tableau Software
  • R
  • Cognos
  • Unica
  • Any other tool not mentioned here.


  1. Somehow there would need to be an email cookie that would tie a certain # of emails to store purchases. I believe CM and Omniture have "email address" cookies, so compare vs the emails of customers who bought. (Not sure about the other analytics providers). Then discount some for cookie deletion. I'd combine Excel and analytics package, which now shows why some analytics providers are worth more than others.

    Please critique my quick response.

  2. I think that approach is fine. What analytics package would you choose, and why?

  3. To answer your question, whichever package you have, Since Omniture has taken over the "paid " market imho, that's what I'd go with if given the choice, though my former expertise was with CM (a fine tool) and WebSideStory. With my current budget GA and IndexTools, it would be harder, then again my org isn't large enough for a CMO. I could be wrong, GA may have the functionality we need, but since my current model is B2B2C, I'm able to get via SQL.

    I thought about the original question some more, and there may be a way to match back orders with cookie sessions without email addresses, but that smells like privacy violation.

    What are some other ideas I may be missing beyond parameter value tags on campaign URLs and then extrapolation/estimation?

  4. I think you answered my question ... you are focused on Web Analytics tools and not Business Intelligence tools, and that is what I was looking for, thanks!

    There is absolutely no right or wrong answer to this question. Whatever set of tools somebody can use to make things work is all that matters.

  5. Anonymous6:20 PM

    I think it's better to deconstruct the question first to really understand the scope of the problem you're trying to solve. It actually breaks down into five components:

    1) Find all customers who have an e-mail address, and have purchased at least one time from one of your stores in 2007.
    2) Calculate the probability that this customer visited your website during 2008,
    3) calculate the average number of website visits,
    4) and the average number of purchases during 2008.
    5) If the probability of this customer segment visiting the website is greater than 70%, pull all e-mail addresses

    1a) Find all customers who have an e-mail address:

    This depends on whether your site stores user profiles or identifying information. If this is the case, than the user may have entered an email address that required them to validate it. If that's the case, hope that your development team has somehow tied this information with customer purchases (or that your business intelligence solution can report on these two data sets. Your web analytics package may also store this information (Omniture with VisualSite would likely be able to collect this information, but you'd really need to talk to your local Omniture rep for this).

    1b) and have purchased at least one time from one of your stores in 2007

    Well, this totally changes things. Although a user may not have an email address in your site itself (say, for personalization purposes), they may have given you an email address during product checkout. In that case .. the logical choice would be to query your ecommerce database (Magento, Shopify, etc) or your payment systems (Paypal, Google Checkout, Visa, etc) to figure out who ordered something in 2007. Then, remove all records that don't contain an email address. Just remember though, if your site allows anonymous purchases (and doesn't require the use of valid email addresses), then you may be neglecting a large portion of your user base.

    Calculate the probability that this customer visited your site during 2008:

    This is harder, and you may not have that data readily available yet. In order to do this, you need to have a reliable metric that you can track that tells you "John came back, the very next day". In the case of a site that requires login (like facebook), this can be fairly trivial - if John logged in once in 2008, then he qualifies. Again, this statistic is more than likely readily available in the backend and is very easy to track.

    It becomes more complex, but still feasible by using cookies or IP & Geo information. In this case, you should segmenting ased on "purchases in 2007" and "return visit in 2008". If you don't have this data in place already, than you may be able to instrument this in the back end, or ask your web analytics or BI provider to help you figure this out. You can come to an answer in many ways - by combining disparate sources of information in a BI tool or by relying on an enterprise web analytics provider to help you with page tags so that you can achieve this.

    Calculate the average number of website visits:

    You're getting very close to something called visitor loyalty. Calculating this value isn't that hard if you have the data. You just need to add the number of visits per person and divide by the number of people.

    Calculate the average number of purchases during 2008:

    Again, this is fairly trivial so long as you've got the data either from the back end or in your web analytics package. Add the number of purchases, divide by number of users.

    If the probability of this customer segment visiting the website is greater than 70%, pull all e-mail addresses for an e-mail campaign:

    You can find that metric based on the first two in the problem (1 & 2). The next metrics (3 & 4) are useless in finding the answer .. which makes me wonder why you even asked them in the first place.

    And like you said in the comments, there is no right or wrong answer. That's because there are many, many ways to arrive at the same answer. Web analytics may do it. Business Intelligence may do it. The back end may do it.

    So .. yeah, that's my two cents worth. Reach out to me on twitter (@seanpower) if you have any questions.

  6. Yup, Sean, that explanation works.

    There are going to be some folks at the traditional BI shops who are going to be disappointed with these approaches, since they don't utilize traditional BI tools.

    And that's just fine --- all that matters is an efficient and accurate path to the final answer.

  7. Anonymous11:43 AM

    I wouldn't necessarily say that the approach doesn't use traditional BI tools. You can pull this off with MicroStrategy, Cognos, etc - so long as the data is IN the BI tool.

    If it's not .. they're out of luck.

    If it is, then the process will likely be much faster.

  8. Well, frankly, most web analytics tools are out of luck too. How many web analytics tools can you count that can incorporate whether a customer has purchased "in any store" in 2007?

    I can count very few:
    1 The old VS now owned by Omniture
    2 Unica's NetInsight
    3 Maybe SiteCatalyst including when including its warehouse module, not sure.

    At least #1 and #2 are capable of segmenting online visitation in 2008 by offline customer details (i.e. has email address and made a purchase in 2007)

    But in order to put this capability to use there is data integration work to be done:

    * map online identity to offline identity
    * tap both data sources
    * load into the system
    * configure dimensions and metrics

    This questions sounds like a typical Forrensics question though. So I bet Kevin has gained some experience doing this analysis for your clients and working with whatever tools they had available.


  9. I typically take all of the data from various tables, import it from the client to my PC, and run my highly customized SPSS code. I am not a fan of BI solutions and web analytics segmentation --- for my clients, I need to write customized code to answer questions on the fly.

    Now all that being said, any tool that works against tables integrated across channels works just fine ... I am absolutely software-agnostic.

    I suppose the point of this article was to get people to think --- that is most important.


Note: Only a member of this blog may post a comment.

I Just Don't Understand It

E-commerce pros ... I'll address you tomorrow. Can I share something with the rest of you? I'm as busy right now as I've been in...