Back in January, I wrote a post about the dangers of rushing to judgment based on small numbers. I recommended that even if GWO has declared a winner, it’s best to let experiments continue to run at least two weeks and gather at least 100 conversions per version. Today I’m going to play devil’s advocate and argue that in some cases, you might just want to pull the plug earlier!
We’re currently running an A/B test for a lead-generation website. We believed that the original version of the page had lots of room for improvement, so we were pretty confident we could boost conversions.
GWO very quickly confirmed our suspicions: four days into the test, GWO declared a winner. Our version B was outperforming the original by 139%.
We urged the client to keep the test running, for the reasons discussed in my earlier post. They agreed and the test is still running. In the past few days, the observed improvement has fluctuated, but it’s clear the new page is better. It’s just a question of how much better.
My normal inclination would be to keep the test running until the numbers settle down. But there is a serious potential downside to doing so: by continuing to show the losing version to half of our client’s visitors, our client is potentially losing sales. The longer we keep the test running, the more our client is potentially losing!
So… do we keep the test running until we get more precise numbers? Or do we stop the test now, take full advantage of the improved performance of the new page, and move on to the next test?
I’d like to suggest some guidelines as to when it’s better to end experiments earlier than normally recommended. For example, if all these criteria are met, perhaps it’s better to stop the experiment:
- GWO has declared a winner.
- The results, though early, indicate a very large difference in performance between pages;
- There is no reason to doubt the early results (i.e. the large performance difference is not unexpected);
- There is no reason to expect that seasonal or day-of-week/month factors may have skewed results; and
- Each conversion has a substantial monetary value (i.e. there’s a good chance that keeping the experiment running is costing the client money).
It seems to me, the bottom line is this: though GWO utilizes the scientific methodologies of A/B and multivariate testing, its purpose is for marketing, not pure science. We need to know which page performs better, but we don’t necessarily need to know precisely how much better.
Keeping an experiment running can be costly. Sometimes it’s better just to pull the plug early and move on to the next test.