Blog/A/B Testing

Beyond the Data: The Mental Models That Drive Experimental Success

By Marcella Sullivan

October 2, 2024 ·

This article explores ten mental models driving successful experimentation, drawing on insights from 64 leading CROs.

Key points:

Mental models, like the scientific method, feedback loops, and probabilistic thinking, are essential parts of the experimenter’s toolkit.
The importance of a scientific approach to learning, including well-crafted hypotheses, is universally acknowledged as the foundation of effective experimentation.
Industry leaders share a common trait: an insatiable curiosity that fuels their pursuit of improvement.
The best experimenters constantly evolve their strategies, treating new data as an opportunity for iteration.

In experimentation, data is often seen as the key to success: collect enough of it, run enough tests, and you’ll find the answers you’re looking for. While this is certainly important, this view overlooks the real differentiator: mental models.

These frameworks shape how leading CROs think, make decisions, and interpret results. Simply gathering more data and sharpening analytical techniques won’t automatically lead to better outcomes. Mental models aren’t abstract concepts, they are practical tools that can mean the difference between average and exceptional outcomes.

In this article, I pull insights from 64 interviews in Convert’s ‘Think Like a CRO’ series with top experts in the field, paired with real-world case studies to show how these thinking tools can transform your approach to experimentation.

1. The Scientific Method: Structuring Success

The scientific method is the cornerstone of all rigorous experimentation. It provides a structured, systematic approach that ensures objectivity, repeatability, and a clear path from hypothesis to conclusion. David Stepien describes it as “the trunk of your knowledge tree”, ensuring the leaves (the details) have a sturdy foundation to cling to. May Chin aptly calls optimisation “innovation driven by the scientific method”.

This mental model thrives on the capacity to be wrong, a critical part of the learning process. Richard Feynman, a renowned physicist, outlined it this way:

First we guess it; then we compute the consequences of the guess, then we compare the result with [the] experiment… If it disagrees with the experiment, it is wrong.

Kyle Hearnshaw praises it as a tool that humbles egos by constantly proving how we can’t predict results:

The mixture of strategic thinking and creative problem-solving, with the scientific method humbling any egos, was a perfect blend.

Importantly, Kyle mentions that the scientific method provides a framework for consistently achieving success, not by avoiding errors but by learning from them.

At its core, the scientific method is a disciplined approach that transforms curiosity and hypothesis into validated knowledge. It embraces the possibility of being wrong as a pathway to discovering what is right. Every failure is just a stepping stone towards greater understanding.

2. Feedback Loops: An Engine of Continuous Improvement

This essential mechanism transforms insights into action and creates a cycle of perpetual learning and iterative enhancement.

Data from experiments, team reflections, and process evaluations need to be captured, reviewed, and used to inform future work. At Workato, Alex Birkett’s team exemplifies this approach where lessons from each experiment directly feed into the next:

Once the experiment is finished, it is analysed and the experiment document is updated with conclusions and learnings. [This] is available to anyone in the company, and we also do a weekly experiment review meeting.

The real power of feedback loops isn’t just in recording wins and losses but in understanding the ‘why behind those outcomes, as Nils Koppelmann advises:

Optimisation efforts should not aim to prove you are right or wrong, but to determine why—in either case. There is no point in optimising anything if you don’t understand how you got there and how to replicate it.

The best optimizers also tap into real user feedback to get smarter with each iteration. Georgiana Hunter-Cozens highlights a key skill of a good experimenter—transforming user feedback into actionable insights. The ability to see tangible improvements resulting from the effective application of feedback is a rewarding experience for any optimiser.

Feedback loops can also accelerate the pace of improvement by integrating real-time data into the process. Ruben de Boer even sent out a great article on feedback adaptation in his newsletter some months back. The article, written by Lotte Cornelissen, highlights the importance of real-time adaptation:

By embedding a feedback loop in every sprint, teams can quickly adjust their actions based on real-time data, enhancing both the impact and speed of their work.

This approach allows for a deeper understanding of both the internal dynamics of the organisation and the external realities of user behaviour, making the path to optimisation both structured and responsive.

Case Study: Meta-Experiments at Booking.com

Booking.com wrote an article on how they apply meta-experiments—tests on their experimentation framework itself—to refine their methodology. For example, they might experiment with different sample sizes or statistical thresholds to understand how those variables affect the reliability of their tests. By doing so, they are not just focused on optimising their product but also on optimising the way they run experiments.

Feedback loops are about continuous improvement based on past outcomes. Here, Booking.com creates an iterative process where learnings from each experiment feed back into improving the framework for future experiments. The system becomes self-improving. Meta-experiments ensure that their experimentation process is always evolving, minimising errors and enhancing efficiency. This constant refinement makes their overall approach more robust and adaptable, one of the many reasons Booking.com is highly respected in the field of experimentation.

3. First Principles Thinking: Breaking Down Problems

This powerful mental model involves deconstructing complex problems into their most basic elements and building solutions from the ground up.

Unlike reasoning by analogy, which relies on established patterns and assumptions, first principles thinking requires questioning every assumption and understanding the fundamental truths that underlie a problem.

Asmir Muminovic captures this approach succinctly when describing experimenters as detectives:

We are detectives… we uncover small pieces of this big puzzle and form a big picture.

This perfectly encapsulates first principles thinking—identifying the most basic elements of a problem before assembling them into a coherent solution.

Lenny Rachitsky, in his exploration of first principles thinking, emphasises that

Thinking from first principles is about challenging assumptions and going directly to the source to figure out what’s actually true.

In A/B testing, first principles thinking might lead experimenters to question whether small changes like button colors are truly the most effective optimizations. Instead, they might ask: ‘What fundamental user problems are we trying to solve?’ This can uncover more impactful, structural changes in user flow or content strategy.

Case Study: Airbnb’s Use of First Principles Thinking in Experimental Design

Airbnb transitioned from A/B testing to interleaving for ranking algorithms. In A/B tests, users are split into groups, each seeing one version of a ranking algorithm. However, this method required large sample sizes and time to produce statistically significant results. With interleaving, Airbnb could show users mixed search results from different algorithms in real-time, allowing them to more quickly and directly measure the performance of each algorithm.

Instead of continuing to rely on the widely accepted method, Airbnb went back to the core of its challenge: ‘How can we compare ranking algorithms quickly and effectively?’ They stepped away from assumptions about testing methods and rebuilt their approach from the ground up. This break from conventional thinking enabled them to speed up insights without compromising on validity.

Case Study: Booking.com and First Principles of User Needs

In a talk I saw from Erin Weigel, former Principal Designer at Booking.com and author of ‘Design for Impact: Your Guide to Designing Effective Product Experiments’, she discussed a test that reshaped how the platform approached user experience. The team was tackling the challenge of making it easier for users to compare different listings. Designer Catalin Bridinel proposed a solution: opening each listing in a new tab.

They had observed that power users were already doing this on their own, but less confident users didn’t seem to know how. By automating the action for them, Booking.com simplified the comparison process, making the experience more intuitive for all users.

This process demonstrates First Principles Thinking. Instead of iterating on conventional designs such as saved lists, the team reconsidered the core problem: what do users ultimately want when browsing listings? The answer is simple: to easily compare different listings.

By going back to the fundamental needs of users, Booking.com developed an elegant solution rooted in actual use behaviour. This test became highly influential in the industry, demonstrating the power of first principles in optimising user experience.

4. Probabilistic Thinking: Navigating Uncertainty

This mental model involves assessing the likelihood of different outcomes and making decisions based on probabilities. It’s essential for managing risk and interpreting data in a way that acknowledges the inherent uncertainty of experimental results.

Bhavik Patel, who knows more than most about interpreting data, highlights the importance of this model when he advises:

Recognise the biases in your experiments, separate the signal from the noise in your results.

This mindset encourages considering the full range of possibilities, especially when unexpected events or outliers could significantly impact results. In A/B testing, this means being prepared for scenarios where the data might not align with initial expectations and being flexible enough to adapt the testing strategy accordingly.

Practitioners of this approach constantly ask questions like ‘What else might happen?’ and ‘What if we’re wrong?’ instead of assuming things will go as planned. Maria Luiza de Lange distils this mindset by describing optimisation as ‘being able to reduce risk’, highlighting the importance of addressing uncertainty as a fundamental part of the experimentation process. Marc Uitterhoeve further emphasises that one of the top priorities for optimisers is reducing the risk of making the wrong changes, rather than just focusing on finding a winner.

5. Occam’s Razor: Simplifying Complexity

Occam’s Razor suggests the simplest explanation—requiring the fewest assumptions—is often the correct one. This mental model helps reduce unnecessary complexity, making hypotheses easier to test and results more reliable.

The power of Occam’s Razor lies in its ability to cut through the noise; “I firmly believe that the simplest solutions often provide the clearest insights”, says Christoph Böcker.

Jon Yablonski suggests that in any problem-solving scenario, including web design and experimentation, it’s easy to overcomplicate solutions with details that might seem beneficial but can actually distract from the core objectives. He notes that ‘complicating the simple is commonplace’, and applying Occam’s Razor helps maintain focus on what truly matters by avoiding unnecessary elements that do not add value.

Lorik Mullaademi’s interview introduced me to a methodology that fits this mental model perfectly — KISS (Keep It Simple, Stupid) — a “principle [that] advocates for simplicity in design and systems, suggesting that complexity should be minimised to enhance user acceptance and interaction”.

And yet Occam’s Razor is not just about choosing the simplest solution; it’s also about ensuring that every element or assumption in an experiment has a purpose.

When designing an A/B test, for instance, this principle guides experimenters to evaluate each hypothesis critically—ensuring that only the most essential and impactful variables are included. This not only streamlines the testing process but also makes the results more actionable.

However, it’s crucial to apply this principle with care. Occam’s Razor should not be used to oversimplify complex issues where complexity is necessary to capture the full scope of the problem.

As noted in discussions on its application in science, the goal is to simplify only to the extent that it improves clarity and understanding without sacrificing the accuracy of the model or hypothesis.

6. Inversion: Flipping the Problem on Its Head

Inversion is a mental model that involves thinking about problems in reverse—starting with what could go wrong and working backwards to avoid them.

While it may not be the cheeriest of mental exercises to apply, it has been used since the Stoics who believed this strategy not only lessens their fears of what could go wrong but helped them better prepare for the eventuality. This approach is invaluable for identifying risks and ensuring that strategies are aligned with goals. As Bhavik Patel said in his interview, “It’s all too easy to make decisions that hurt rather than help“.

Alex Birkett provides a clear example of inversion in action:

Optimisation isn’t always the answer to your business problems, and knowing when it isn’t is a big strategic advantage.

Sometimes, the best move is to recognise that not every issue can be solved through optimisation alone.

By inverting their thinking, experimenters can step back and assess whether optimisation is the right approach, or if it might lead to diminishing returns or unintended consequences.

Applying the Stoic practice of premeditatio malorum, or the ‘premeditation of evils’, allows experimenters to ask critical questions like, ‘What could derail the validity of our test results?’

This line of questioning helps identify factors that could compromise the integrity of a test, such as incorrect implementation of variations, confounding variables, or data collection errors. Daniel Jones captures this mindset perfectly, saying he has learned to “treat all data as if it’s wrong” until proven otherwise.

This anticipatory questioning not only safeguards against challenges but also fosters the development of contingency plans and alternative strategies, which are essential for effective and resilient experimentation. As Oliver Kenyon puts it, “optimising is as much about learning what doesn’t work as it is about confirming what does“.

7. Idealised Models: Simplifying the Complex

Idealised models involve deliberately simplifying or distorting aspects of a system to make it more understandable and manageable.

In experimentation, it often feels like there are more models than practitioners, with new frameworks constantly emerging and the validity of their predecessors perpetually being questioned. Discussing the models available to experimenters is an article (or book…) in itself, so I won’t get into that here.

Jarrah Hemmant discussed the wealth of models available to experimenters in her interview:

CRO excels as a field because it has so many varied experts and different opinions, strategies, and tactics that you can explore and grow with. I would always suggest exploring a wide variety of resources that are out there, and keeping your critical thinking hat on to draw your own conclusions.

When it comes to models, as Jarrah touches on, the challenge isn’t finding one, it’s choosing the right one for your specific context and actually using it. A model is only as good as its application.

These models don’t represent reality perfectly, but they make complex systems more workable.

Deborah O’Malley advocates for a ‘smart hypothesis’ approach in her interview. It allows an experimenter to “clearly identify the target audience, conversion problem, and suggested solution, states the anticipated outcome, and of course, defines the test conversion goal.” Simplification with purpose.

Asmir Muminovic discusses the necessity of such simplifications, noting, “You need to make assumptions about which variables are most important“. Daniel Jones adds that while these simplifications are useful, they must align with broader strategic goals: “Get clear on your company’s strategy and goals before optimising“. Together, these insights show how idealised models can be both powerful and practical tools in experimentation.

8. Second-Order Thinking: Considering Consequences

Second-order thinking looks past the immediate effects of an action to consider its long-term consequences.

While first-order thinking often focuses on solving the problem at hand, second-order thinking requires a deeper analysis of how today’s decisions will impact future outcomes.

Callum Dreniw underscores the importance of this approach, urging experimenters to “embrace losing experiments by having a testing mentality… [so] you’re able to take those learnings and iterate accordingly.” This perspective highlights that the immediate outcome of an experiment—whether success or failure—is only the beginning. By considering not just the direct results but also the second-order effects, experimenters can refine their strategies, turning short-term losses into long-term gains.

Oliver Kenyon complements this view by pointing out that “Big results don’t come from one test; they come from constant compounding winning tests.” It’s the accumulation of small, iterative improvements that creates significant, lasting change.

But beware of unintended consequences. Marc Uitterhoeve shares a cautionary tale:

We’ve had clients that implemented their own winning A/B tests, but after we analysed their data it seemed that indeed their AOV had increased, but their conversion rate dropped way more.

Thinking through these potential second-order effects, experimenters can better anticipate and mitigate risks.

Furthermore, second-order thinking encourages a holistic view of experimentation. As Sergio Simarro Villalba points out, “Your work impacts and needs consistency with other areas such as traffic acquisition, design, UX, CRM, development, etc“. Sergio calls this the digital ecosystem, a great way of describing it. Harmony in this ecosystem will not only produce better work but also expand individual team members’ knowledge.

In essence, second-order thinking transforms experimentation from a series of isolated tests into a cohesive, forward-looking strategy. It pushes experimenters to look past immediate outcomes, focusing instead on how today’s decisions will impact future results, ensuring that each action contributes to long-term success.

Case Study: Netflix’s Use of Proxy Metrics

Netflix tech blog is putting out some of the best large-scale experimentation content out there right now. In one of their recent articles, they detailed how the company streamlined long-term experiments by identifying short-term metrics, or proxy metrics, that correlated with long-term success. Instead of waiting months to gauge retention rates, Netflix has begun using immediate engagement data, like how often users interacted with new content, as a reliable indicator of future behaviour. This enabled faster iteration and quicker decision-making.

This is a strong example of Second Order Thinking. Netflix knew that relying solely on first-order results, like long-term retention, would slow down their experimentation process. Through digging deeper, they identified metrics that reliably predicted long-term success. This allowed them to move faster without sacrificing accuracy. Instead of being trapped in short-term thinking, Netflix considered the downstream effects of their decisions, ensuring that early proxy metrics aligned with their long-term business goals.

Case Study: Conversion and the Value of Losing A/B Tests

At Conversion, an analysis of over 10,000 experiments revealed something surprising: losing A/B tests can be more valuable than winning ones. By comparing ‘first tests’ on specific levers, they found that when the first test was a winner, future tests had a 35% win rate. But when the first test was a loser, the win rate increased to 41%. Even more impressive, the uplift from subsequent tests after a losing experiment was double that of those following a win.

This taps into Second-Order Thinking. Rather than discarding losing tests, Conversion used them to learn and pivot.

9. Falsifiability: Testing Hypotheses

Falsifiability is the idea that a hypothesis must be testable and capable of being proven false. Without this, experiments cannot produce meaningful or reliable results.

This came up in so many of the interviews and it’s easy to understand why. Deborah O’Malley stresses the importance of this model:

Your hypothesis is the backbone of a solid A/B test.

This sentiment is reinforced by Bhavik Patel, who warns against manipulating data to fit preconceived ideas, advising instead to ensure that hypotheses are rigorously tested before being accepted.

Falsifiability ensures that experimentation remains grounded in objective reality, where every hypothesis is subject to scrutiny and potential disproof.

I’ll finish the section on falsifiability with my two favourite quotes on hypotheses from the interview series:

“Start with a solid hypothesis and stick with it throughout. Don’t tweak it just to make the experiment suddenly win.” – Shirley Lee

“Take hypotheses seriously. It sounds basic but I see a lot of tests without hypotheses or supporting data. A properly formulated hypothesis statement sets the stage for what the report will tell you at the end of a test. If you don’t take the time up-front to define this, you will have no idea what to look for in your test results or why you ran the test in the first place.” – Tracy Laranjo

10. Bias: Avoiding Cognitive Pitfalls

Cognitive biases can significantly distort the outcomes of experiments if they aren’t consciously recognised and managed.

These biases can subtly influence how data is interpreted, leading to flawed conclusions and misguided strategies. For experimenters, staying aware of these biases and actively working to counteract them is essential for maintaining the integrity of their work. Here, we explore the three big ones:

Confirmation Bias

The tendency to focus on information that confirms pre-existing beliefs while ignoring evidence that contradicts them.

This bias is particularly dangerous in experimentation because it can lead to cherry-picking data that supports a preferred outcome while disregarding data that tells a different story. Melanie Kyrklund warns against “resorting to confirmation bias when analysing the results of a hypothesis you are particularly attached to”.

To avoid this bias, experimenters should rigorously test their hypotheses with the understanding that being proven wrong is as valuable as being proven right. By remaining open to all outcomes, experimenters can ensure their conclusions are grounded in reality, not just in what they hoped to find.

Availability Heuristic

This occurs when experimenters over-rely on information that is most readily available, often leading to a narrow perspective that overlooks more comprehensive data.

Linda Bustos spoke about the danger of relying on ‘quick win’ tests that might skew the broader understanding of user behaviour.

To combat this, allocate resources to user research and data analysis, ensuring that decisions are based on a complete picture rather than the most convenient snapshot.

First-Conclusion Bias

First-Conclusion Bias is the inclination to settle on the first plausible solution without fully exploring other possibilities.

This can lead to premature conclusions and missed opportunities for more effective strategies. Kevin Szpak highlights the importance of challenging initial assumptions, saying, “Aim to do right, not be right“. It’s a call to continuously learn and unlearn, instead of settling on what seems correct at first glance.

In questioning initial assumptions and exploring alternative hypotheses, experimenters can open doors to more nuanced, effective conclusions.

The Experimenter’s Toolkit: Mental Models in Action

Mental models shape how we understand problems, come up with ideas, navigate complexity, manage uncertainty, and ultimately, drive more reliable and impactful experiments.

As David Sanchez del Real aptly puts it, “Experimentation is about the mindset of trying to make things better and daring to put ideas to the test“. This mindset, supported by the strategic application of mental models, is what differentiates successful experimenters from the rest.

Whether it’s embracing feedback loops for continuous improvement, applying first principles thinking to break down complex problems, or leveraging probabilistic thinking to navigate uncertainty, these models are indispensable tools in the experimenter’s toolkit.

Through the insights shared by leading professionals, it’s clear that these mental models are not just theoretical concepts but practical tools that, when applied thoughtfully, can significantly enhance the effectiveness and impact of experimental work.

What Did I Learn from Reading 64 Interviews with CROs?

There is far more agreement than disagreement among these experts about what truly drives success in experimentation.
One of the most resounding takeaways from these interviews is the critical importance of the scientific method and hypotheses in experimentation.
A well-crafted hypothesis isn’t just a starting point, the experts repeatedly stress that without a strong, testable hypothesis, the entire experimentation process is likely to deliver poor-quality outcomes.
Many of these experienced CROs emphasised the invaluable role of mentorship in their career development. Whether you’re just starting out or looking to refine your skills, finding a mentor can be a transformative step. The guidance, wisdom, and support that come from a mentor who has walked the path before you can accelerate your growth and help you navigate the complexities of the field with greater confidence. As David Stepien put it, “An expert is someone who has made all the mistakes there are to make. Soliciting their feedback regularly makes all the difference in your learning curve”.
Across the board, these industry leaders show a deep passion for their work in their interviews. My favourite question was the one that asked ‘What inspired you to get into testing & optimisation?’ This took each person back in time – to a classroom, a first job or the webshop they built at 11 years old (shoutout Lucia van den Brink).
I was struck by the number of times the word ‘love’ was used. Their curiosity and relentless drive to improve are evident in the interviews, revealing a community that is not only committed to their craft but also constantly seeking ways to do things better. Steph Le Prevost talked about how she believes you have to love optimisation to do it well. She said “Think about it, you work so hard to derive insights, come up with well-rounded hypotheses and all signs point to a winner, but then boom you get a loser or worse an inconclusive result” – but you get up and try again.

I’ve heard this called both the experimentation bug and the experimentation curse, I think that depends on the day.

Mobile reading? Scan this QR code and take this blog with you, wherever you go.

Originally published October 02, 2024 - Updated October 03, 2024

Written By

Marcella Sullivan

Convert’s In-House CRO Expert.

Edited By

Carmen Apostu

Head of Content at Convert