Welcome back to our two-parter, full of more top-notch tips for great AB testing! In [Part One], we defined A/B testing, talked about its value for your business, and gave you some ideas for how to figure out what to test.
Now you have a lot of ideas for tests to run. So we’ll help you prioritize them. Then, we’ll give you the skinny on some AB testing statistics, so you can run a more informed test (and more fully understand your analytics)..
Prioritizing Ideas for AB Testing
To know which tests to run first, you can use a variety of frameworks. Here are a few we recommend:
- ICE (Impact, Confidence, and Ease): Each of the three factors receives a 1-10 score. The easier it is for you to run the test yourself, the higher the Ease score. The confidence you have in the data being tested and the impact results will have on your business outcomes also play a role. As you can see, it’s easier to use this framework the fewer people you have on your team: more people = more subjectivity.
- PIE (Potential, Importance, and Ease): Just like with the ICE test, each factor receives a 1-10 ranking. It is just as subjective as the ICE framework: everyone has a different idea about what’s important, easy, or has potential. But if you agree on each definition, both ICE and PIE can become more objective.
- PXL: PXL asks a series of Y/N questions: yes gets 1, no gets 0. The higher the score, the more likely this test will be relevant to your brand. You can download the spreadsheet for this test and check it out.
Now that you know your priorities, you’re ready to run your test. Let’s go over some of the statistics involved in AB testing, so you’re not left scratching your head.
AB Testing Statistics Cheat Sheet
- External Validity Threat:
-
-
- External factors that threaten the validity/reliability of an AB test such as:
- Holidays (where e-commerce traffic might be heavier)
- Good (or bad) press for your company, industry, or type of product or service
- Major ad campaign launch dates
- Mean Average: When AB testing, you want to find a mean that represents the whole.
- Regression to the Mean: the principle that if something is extreme when first measured, it will likely be closer to the average at the next measurement.
- Statistical Power: Statistical Power answers the question: Assuming a difference between A and B, how often will you see the effect? The higher the statistical power level, the easier it will be to discern the winner of your AB test.
- Sampling/Sample Size: Your sample size is the number of people you test, sampling refers to the fact that this group of people represents a sample of the whole population. The larger the sample size, the more representative of the larger population your test results will be.
- Statistical Significance: Statistical Significance answers this question: If there’s no difference between A and B, how often will you see the effect just by chance? The higher your statistical significance, the more relevant your test (and results) are. You’ll also get a higher statistical significance if you run your test for the right amount of time. We suggest 2-4 weeks.
-
- Variance: Average variability. How much each individual varies from the mean/average.
Now What?
Now that you know what you’re testing—-and what you’re looking for when it comes to collecting data—you need to form a hypothesis for your test. We like Craig Sullivan’s formula which uses this template:
- Because I saw [insert data/feedback from research]
- I expect that [change you’re testing] will cause [impact you anticipate]
- I’ll measure this using [data metric]
Now you’re ready to choose a testing platform (like Google Optimize, VWO, or Optimizely) and run your test. We hope our resources set you up for your success!