I’ve been working with a product team on how to get better at hypothesis testing. It’s a lot of fun.
They were introduced to dual-track Agile by Marty Cagan and are doing a great job of putting it into practice.
As they explore how to support backlog items with research in the discovery track, they are finding that hypothesis testing isn’t as easy as it sounds.
Very few of us have had to formulate hypotheses and design experiments since perhaps our elementary school science fair days.
And while the scientific method conceptually is easy to grasp, putting it into practice can be much more challenging.
Why You Should Get Better at Hypothesis Testing
Across the Internet industry, we are seeing a shift from the “executive knows all” mindset to an experimentation mindset where we support ideas with research before investing in them.
More companies are running A/B tests, conducting usability studies, and engaging customers in discovery interviews than ever before.
We are investing in tools like Optimizely, Visual Website Optimizer, UserTesting.com, KissMetrics, MixPanel, and Qualaroo.
But teams who invest in research are quickly finding that their experiments are only as good as their hypotheses and experiment design.
It’s a classic case of garbage in, garbage out.
You can waste thousands of dollars, hundreds of hours, and countless sprints running experiments that don’t matter. That don’t net any meaningful results.
Experimentation is like Agile. It’s a tool in our toolbox, but we still need to do the strategic work to get value out of it.
Agile will help us move through a backlog quicker, but it won’t help us put the right stories in the backlog. – Tweet This
Experimentation will help us support or refute a hypothesis, but we have to do the work to design a good hypothesis and a good experiment.
Avoid These 14 Common Pitfalls
You don’t have to be a rocket scientist to know how to formulate a good hypothesis or design a good experiment, but you do want to avoid these common mistakes.
1. Not knowing what you want to learn.
Too many teams test anything and everything. See: Why Testing Everything Doesn’t Work.
If you want to get meaningful results, you need to be clear about what you are trying to learn and design your experiment to learn just that.
In the product world, we can experiment at different levels of analysis.
We can test our value propositions.
We can test whether or not a feature delivers on a specific value proposition.
We can test a variety of designs for each feature.
And we can test the feasibility of each solution.
Too often people test all of these layers at once. This makes it hard to know which layer is working and which is not and often leads to faulty conclusions.
Instead, be clear about what you are testing and when. This will simplify your experiment design and fuel your rate of learning.
2. Using quantitative methods to answer qualitative questions (and vice versa).
Qualitative methods such as interviews, usability tests, and diary projects help us understand context.
They are great for helping us to understand why something may or may not be happening. They expose confusing interface elements and gaps in our mental models and metaphors.
However, with qualitative research, it can be hard to generalize our findings beyond the specific contexts we observe.
Quantitative methods, on the other hand, allow us to go broad, collecting large amounts of data from broad samples.
Think A/B tests, multivariate tests, and user surveys.
Quantitative research is great for uncovering how a large population behaves, but it can be challenging to uncover the why behind their actions.
An A/B test will tell you which design converts better but it won’t tell you why.
The best product teams mix-and-match the right methods to meet their learning goals. – Tweet This
3. Starting with untestable hypotheses.
It’s easy to be sloppy with your hypotheses. This might be the most common mistake of all.
Have you found yourself writing either of the following:
- Design A will improve the overall user experience.
- Feature X will drive user engagement.
How will you measure improvements in user experience or user engagement?
You need to be more specific.
A testable hypothesis includes a specific impact that is measurable. – Tweet This
At the end of your test it should be crystal clear whether your hypothesis passed or failed.
4. Not having a reason for why your change will have the desired impact.
You might know what you want to learn, but not know why you think the change will have the desired impact.
This is common with design changes. You test whether a blue or green button converts better, but you don’t have a theory as to why one might convert better than the other.
Or you think a feature will increase return visits, but you aren’t quite sure why. You just like it.
The problem with these types of experiments is that they increase the risk of false positives.
A false positive is an experiment result where one design looks like it converts better than another, but the results are just due to chance.
Internet companies are running enough experiments now that we need to start taking false positives seriously.
Always start with an insight as to why your change might drive the desired impact.
Need further convincing? Spend some time over at Spurious Correlations.
5. Testing too many variations.
Google tested 41 shades of blue. Yahoo tested 30 logos in 30 days.
Don’t mimic these tests. I suspect these companies had other reasons for running these experiments other than finding the best shade of blue or the best logo.
Each variation of blue has a 5% chance of being a false positive. Same with each logo. If you want to increase the odds that your experiment results reflect reality, test fewer variations.
Suppose you are testing different headlines for an article. You don’t want to test 25 different headlines. You want to identify the 2 or 3 best headlines where you have a strong argument for why each might win and test just those variations.
More variations leads to more false positives. You don’t have to understand the math, but you do need to understand the implications of the math.
Here’s the key takeaway:
Run fewer variations and have a good reason for testing each one. – Tweet This
6. Running your experiment with the wrong participants.
Who you test with is just as important as what you test. – Tweet This
This is often overlooked.
If Apple is trying to understand the buying habits of iPhone customers, they shouldn’t interview price-sensitive buyers. The iPhone is a high-end product. Apple needs to interview buyers who value quality over price.
If you are marketing to new moms, don’t run your tests with experienced moms. Don’t run your tests with people who don’t have kids. Their opinions and behaviors don’t matter.
This can be trickier than it seems. Often times the who is implied in the hypothesis. Do the work to make it explicit so you don’t make these errors.
7. Forgetting to draw a line in the sand.
It’s easy after the fact to call mediocre good. But nobody wants to be mediocre.
The best way to avoid this is to determine up front what you consider good.
With every hypothesis, you are assuming a desired impact. Quantify it. – Tweet This
For quantitative experiments, draw a hard line in the sand. How much improvement do you expect to see?
Draw the line as if it’s a must-have threshold. In other words, if you don’t meet the threshold, the hypothesis doesn’t pass.
This takes discipline, but you’ll get much better outcomes if you stick with it.
8. Stopping your test at the wrong time.
Many people make the mistake of stopping their quantitative tests as soon as the results are statistically significant. This is a problem. It will lead to many false positives.
Determine ahead of time for how long to run your test. Don’t look at the results until that time has elapsed. – Tweet This
Use a duration calculator. Again, you don’t need to understand the math, you just need to know how to apply it.
And be sure to take into account seasonality for your business. For most internet businesses Monday traffic is better than Thursday traffic. This will impact your test results.
If you want a statistical explanation for why fixing the duration of your test ahead of time matters, read this article.
9. Underestimating the risk or harm of the experiment.
Experimenting is good.
Ignoring the impact your experiment might have is bad.
Yes, we can and should make data-informed decisions. But this doesn’t mean that we should take unnecessary risks.
For each experiment, we need to understand the risk to the user and the risk to the business. – Tweet This
And then we need to do what we can to mitigate the risk to both.
10. Collecting the wrong data.
You need to collect the right data in the right form.
This one sounds obvious in the abstract, but can be hard to do in practice.
Before you start collecting data, start by thinking through what data you need.
- Do you need to collect the number of actions taken or the number of people who action?
- Are you tracking visits, sessions, or page views?
Thinking through how you will make decisions with this data will help you make sure you get it right. Ask yourself:
- What would the data need to look like for me to refute this hypothesis?
- What would the data need to look like to support this hypothesis?
The more you think through how you might use the data to drive decisions, the more likely you will collect usable data.
11. Drawing the wrong conclusions.
It’s easy to draw the wrong conclusions from our experiments.
There are two things to keep in mind.
First, experiments can refute or support hypotheses but they never prove them.
We live in a world where nothing is certain. If you want to be a good experimenter, you have to accept this.
Don’t be dogmatic about your results. What didn’t work last year, might work this year. And vice versa.
Second, know what you are testing and make sure your conclusions remain within the scope of that test.
For example, if I am testing the impact of a new feature and it doesn’t have the desired impact that I had hoped, I might conclude that the feature isn’t good.
But I might be wrong. It also could be that the design of the feature wasn’t adequate. Or that the feature was buggy. Or the content that supports the feature was a mismatch.
Before you draw a conclusion, you need to ask, “What else could explain this result?” – Tweet This
12. Blindly following the data.
It’s comforting to think that you can run an experiment and have the results tell you what to do. But it rarely happens this way.
More often than not there are other factors to consider.
We live in a complex world. Your experiment results are only one input of many. You need to use your best judgement to come to a conclusion.
Over relying on your data to tell you the right answer leads to implementing false positives and over optimization.
Don’t forget you are a human. Keep using your brain.
13. Spreading Yourself Too Thin
Just as we look forward to the next iPhone and crave the next episode of Breaking Bad, we also chase after more tools in our toolbox.
I often see teams new to A/B testing, rush to try multivariate testing.
Others jump from interviews to diary projects to Qualaroo surveys.
Each method requires developing a skill-set appropriate for each. Dive deep. Learn a method inside and out before moving on to the next one.
You can get more value from going deep with A/B testing than you will from only understanding the basics of both A/B testing and multivariate testing.
You’ve got a long career ahead of you. Go for depth before breadth. – Tweet This
14. Not Understanding How the Tools Work
Know your tools.
Know how to set up conversions correctly.
Know whether they are tracking actions vs. people, visits vs sessions.
Know whether your funnels are page to page or steps at any point in a session.
Understand where they draw the line for statistical significance.
Some consider 80% confidence significant – this means that 1 out 5 tests are likely a false positive. Understand how this impacts your product decisions.
You can try every product on the market, but if you don’t understand how they work, you won’t make good product decisions.
Up Next
Over the next couple of months, we’ll dive deep into each of these mistakes and look at how you can avoid them.
We’ll explore real examples and get specific so that you can take what you learn and put it into practice right away.
Don’t miss out, subscribe to the Product Talk mailing list to get new articles delivered straight to your inbox.
[…] For more tips, see her article on The 14 Most Common Hypothesis Testing Mistakes (And How to Avoid Them) […]