Many people define a hypothesis as an “educated guess”.
To be more precise, a properly constructed hypothesis predicts a possible outcome to an experiment or a test where one variable (the independent one) is tweaked and/or modified and the impact is measured by the change in behavior of another variable (generally the dependent one).
A hypothesis should be specific (it should clearly define what is being altered and what is the expected impact), data-driven (the changes being made to the independent variable should be based on historic data or theories that have been proven in the past), and testable (it should be possible to conduct the proposed test in a controlled environment to establish the relationship between the variables involved, and disprove the hypothesis - should it be untrue.)
According to an analysis of over 28,000 tests run using the Convert Experiences platform, only 1 in 5 tests proves to be statistically significant.
While more and more debate is opening up around sticking to the concept of 95% statistical significance, it is still a valid rule of thumb for optimizers who do not want to get into the fray with peeking vs. no peeking, and custom stopping rules for experiments.
There might be a multitude of reasons why a test does not reach statistical significance. But framing a tenable hypothesis that already proves itself logistically feasible on paper is a better starting point than a hastily assembled assumption.
Moreover, the aim of an A/B test may be to extract a learning, but some learnings come with heavy costs. 26% decrease in conversion rates to be specific.
A robust hypothesis may not be the answer to all testing woes, but it does help prioritisation of possible solutions and leads testing teams to pick low hanging fruits.
An A/B test should be treated with the same rigour as tests conducted in laboratories. That is an easy way to guarantee better hypotheses, more relevant experiments, and ultimately more profitable optimization programs.
The focus of an A/B test should be on first extracting a learning, and then monetizing it in the form of increased registration completions, better cart conversions and more revenue.
If that is true, then an A/B test hypothesis is not very different from a regular scientific hypothesis. With a couple of interesting points to note:
A robust A/B testing hypothesis should be assembled in 5 key parts:
This includes a clear outline of the problem (the unexplained phenomenon) observed and what it entails. This section should be completely free of conjecture and rely solely on good quality data - either qualitative and/or quantitative - to bring a potential area of improvement to light. It also includes a mention of the way in which the data is collected.
Proper observation ensures a credible hypothesis that is easy to “defend” later down the line.
This is the where, what, and the who of the A/B test. It specifies the change(s) you will be making to site element(s) in an attempt to solve the problem that has been outlined under “OBSERVATION”. It serves to also clearly define the segment of site traffic that will be exposed to the experiment.
Proper execution guidelines set the rhythm for the A/B test. They define how easy or difficult it will be to deploy the test and thus aid hypothesis prioritization.
This is where you make your educated guess or informed prediction. Based on a diligently identified OBSERVATION and EXECUTION guidelines that are possible to deploy, your OUTCOME should clearly mention two things:
In general most A/B tests have one primary KPI and a couple of secondary KPIs or ways to measure impact. This is to ensure that external influences do not skew A/B test results and even if the primary KPI is compromised in some way, the secondary KPIs do a good job of indicating that the change is indeed due to the implementation of the EXECUTION guidelines, and not the result of unmonitored external factors.
An important part of hypothesis formulation, LOGISTICS talk about what it will take to collect enough clean data from which a reliable conclusion can be drawn. How many unique tested visitors, what is the statistical significance desired, how many conversions is enough and what is the duration for which the A/B test should run? Each question on its own merits a blog or a lesson. But for the sake of convenience, Convert has created a Free Sample Size & A/B/N Test Duration Calculator.
Set the right logistical expectations so that you can prioritise your hypotheses for maximum impact and minimum effort.
This is a nod in the direction of ethics in A/B testing and marketing, because experiments involve humans and optimizers should be aware of the possible impact on their behavior.
Often a thorough analysis at this stage can modify the way impact is measured or an experiment is conducted. Or Convert certainly hopes that this will be the case in future. Here’s why ethics do matter in testing.