I have a lot of opinions here - and I'll bite my tongue and let you read the article. You draw your own conclusions, ok?
P.S.: The only thing I will say is that if you are going to publish something with conclusions this sweeping, you best have a sample size greater than 8,600 customers to draw sweeping conclusions against. At Nordstrom in 2005-2007, our control groups were 200,000 customers, and the results were still noisy. We got to 200,000 because the statistician who built sample sizes before my team leveraged 5,000 to 10,000 per control group and spent years trying to make sense of gibberish. He made horrible decisions based on gibberish. Only at 200,000 customers did we get reliable results, the kind of results you'd trust publishing against the good name of a major University. Other statisticians will differ with this statement - vet your heroes.
P.P.S.: There is a way to make results of tests with 8,600 customers credible. You repeat the test a couple dozen times and evaluate the variability of your results. I have many clients who do this ... and a -8% lift and a +24% lift gets averaged to a +8% lift over time. But at least you know it is +8% and not +24% or -8%.