No results? Five inconspicuous testing errors that foul-up growth

A handful of testing errors ruins everything. Profitable growth, a higher profit margin, less dependence on expensive “Google drip” – a tiny, half a percent higher conversion rate would probably already provide most individuals responsible for E-commerce in Germany with enough relief to forget one or the other stress faults and sleep well again at night. In parallel, a horde of tool-sales people promise the golden times, if only you would start A/B testing. But what happens in reality? Three critical questions for those in charge

How is the conversion rate developing?
How satisfied are you with how the conversion rate is developing?
Do you feel you can control the conversion rate?

The preliminary trend says:

Redesigns and similar projects often block the path to meaningful, incremental optimization
Know-how from outside is an important factor for the success of the internal optimization team
Despite numerous A/B tests, there is little or no measured effect in linear progress
A large portion of those in charge in the market are not satisfied with the development of their conversion rate

Why is that? From experience from many hundreds of A/B tests and with a genuine look at amultitude of enterprises, numerous reasons are found as to why efforts to optimize unfortunately in many cases merely remain “efforts.” Here are the five most frequent testing errors:

#1 The “I am deceiving myself with bad results” error

The tools are not entirely blameless here. It now takes just a few hours until the results of a test are already significant, according to the tool. “High Five! Winning variation found!”the tool announces with a small trophy icon. Can that be? We have already explained the problem many times. Independent of the purely mathematically calculated significance value (CTBO/CTBB), there are requirements for the test runtime, so that results actually also become valid. A great influence on test results is the traffic source: visitors from newsletters are often existing customers who react differently to any variation than do new customers, There are TV-adverts, varying weather, at the start of the month more money is spent than at the end. The influences are thus very complex and also not to be grasped, even with the most sophisticated segmentation. Whoever runs his test too short will obtain only a short, random section. Therefore (very simplified): Fourteen days are a good minimum (!) test runtime. An approximately representative sample should be reached during this time. Tests during extreme SALES events should be avoided – or the test should be repeated outside of the event. For those who want more agility, better to run more tests in parallel. Why does this testing error foul-up growth? Very simple: The “winning variation” in reality was often no winner at all. Had the test been run only a few days longer, the result would have “converged.” Whoever stops testing too early may be ensnared in a statistical artifact. Ergo: Resources for the test were wasted, incorrect results communicated (the disappointment can be very great!) and, in the worst case, a change that provided no benefit at all was rolled-out.

#2 The irrelevance error

Much too often I see test ideas for enterprises with subjects like “We want to know what happens if we move this box from the left to the right.”What good will that do? Who among us has once put him- or herself in the role of a user and thought, “So, if the box were now on the right side, with the accessories, then I would buy in this shop… ?” Similar is true for tests with different button colors and other details. Such tests will not yield results that will significantly and permanently influence growth. I see it this way: A test variation must be strong enough to actually influence user behavior. Changes that do not meet this challenge also cannot generate measurable results. From the perspective of the optimizer, an A/B test measures a change in the website. This, however, is a fatal fallacy. In actuality, the testing tool measures the consequence of a change in user behavior. A variation that does not change user behavior also cannot provide a result in the A/B test. Why does this testing error foul-up growth? Very simple: Let us assume that a shop generates 20,000 orders per month. For a good test with valid results we need at least two weeks runtime and at least 1,000 conversions per variation. An A/B test with five variations already needs 5,000 orders. If we play it safe, maybe this is 10,000 orders. We have no more than two to four test slots per month; that is a maximum of 20 to 50 test slots per year. What percentage of tests delivers valid results? How high is the average uplift? If you calculate this, it will quickly become clear that the valuable test slots should not be sacrificed for banalities. The more effective the test hypothesis, the more uplift can be generated. In actuality there are, in fact, entirely different limiting factors for the number of good test slots…

# 3 Absent agility

Agility appears to have become a buzzword. The commercial effect of absent agility can easily be calculated in the form of opportunity costs. In fact, it is very easy: Whoever can carry out double as many successful optimizing sprints has twice as much success. The general conditions thereby are very conservative: The issue concerns an online business with 20 million euro turnover and a 15% profit margin. Why does this testing error foul-up growth? The most frequent question asked is: “Is a 25% global and cumulative uplift per year through optimization even possible?” Posing this question, however, is basically justified – but shifts the perspective. For most of those in charge the conversion rate is a fixed constant – a jump from 3% to 4% seems possible through massive price lowering or traffic retrenchment. It is thereby overlooked that it is the website that does the selling. Whoever holds only traffic, assortment and competition responsible for their own conversion rate has, in my opinion, overlooked the greatest optimization leverage. The right question must be:

“What must we do in order to achieve a 25% uplift through optimization?”

In 2013, according to their annual report, Amazon carried-out almost 2000 A/B tests – an immensely high figure that certainly is only even technically possible with extremely high traffic. How much knowledge would Amazon have gained from this? How many tests resulted in uplift? How much of the knowledge is a strategic competitive advantage? The answers should intensify the focus onto your own organization and lead to the following questions: Why does the organization limit the agile implementation of optimization? Who has an interest in that? What can we do? These answers often lead to initial ideas that can break the spell…

# 4 The technical error

Whoever does a lot of testing knows the feeling: poor test results ruin the mood.For conversion optimizers, all of a sudden A/B tests have a greater influence on wellbeing than the weather… 🙂 A frequent phenomenon: If the test runs well, it was a brilliant test idea. If the test runs poorly, there must be a problem with the testing tool. Quite seriously: Even well-run tests in fact suffer more frequently from technical problems than is thought. It is generally known that store times are critical to a positive UX and thereby to the conversion rate. What happens during a test with code injection?

The code is loaded.
At the end of the loading procedure the DOM tree is manipulated.
The front end is changed while the user already sees the page.
The result: a palpable delay in page assembly and flickering effects – a poor user experience.

Those individuals who test via split URL processes are not forearmed to the influences of loading time. Thus, the redirect to the variation is also only triggered during the loading of the control variation. This technical testing error always puts a burden on the variation – and, that is, in an area that is unknown to many. Thus, we optimized a test with negative results (significant -5%) with regard to technology and load times and turned the result into a +7% uplift– with the same variation. It therefore applies: Work together only with really good frontend developers and testing experts who have mastered these effects. Flicker effects can be avoided. The load times can be discounted in a split URL test by simple A/A’/’B/B’ tests. Why does this error foul-up growth? Because, with certainty, very many tests that actually would have provided good results were stamped“doesn’t work” and put on the scrap heap. Much time was lost. Resources were used unnecessarily and – worst – incorrectly conclusions were drawn. I often speak to entrepreneurs, recommend a test, and then hear: “We already tried that – it doesn’t work” For one, this argument in itself is already a growth killer. For another, often technical details were, in reality, what did not function…

#5 The error of incorrect KPIs

Often cited example: I stand in front of a display window and consider whether the shop has suitable products at a suitable price. Dozens of implicit signals let my brain decide in seconds whether I enter the store or not. The first impression has an influence on my decision. Let us assume the display window does not look particularly promising. Very similar to a poor landing page. If the storeowner would polish-up his display window, let’s say to achieve for a high-quality impression, this would possibly tend to move me to enter the store. However, because he only optimized the display window (that is, his landing page) and the rest of the store is just as ramshackle and bad as before, I will merely be more disappointed. The higher micro-conversion subsequently has a contrary effect. We could also call it a case of expectations management or consistency. But, unfortunately, this error is often repeated online: Why does this foul-up growth? Because in the most frequent case supposed improvements are rolled-out that even damage the conversion rate. Unnecessary resources for tests are wasted, time is lost, and the overall result is deteriorated. Above all, medium and small enterprises that do not have enough conversions for A/B testing optimize on clicks and a reduction in the bounce rate. They are well advised to nevertheless measure the total conversion in tests and to keep an eye on this. The only consolation: knowledge from landing page tests can be used and (hopefully) applied successfully in the further course of the funnel. Supplement: The same is true in E-commerce for the relationship between orders (gross turnover) and returns (net turnover). Much too often, unfortunately, the latter is not measured or evaluated – also here the non-fulfillment of expectations can sink the overall profit margin. Bonus tip:

#6 The “We have no business plan for optimization” error

Finally a very simple situation. Above, I spoke of an annual +25% conversion rate uplift and showed what happens to the contrary if only 5% is reached. A mental game:

Optimization costs resources, time, and money. Optimization slots are limited and thereby valuable.
The goal of optimization is a result measurable in business – an ROI.
If costs and benefits are established and also controllable – why is there no business or investment plan for optimization measures?

He who has no goal will never reach it

Sorry, if this gets a bit brief now. I believe very simply that many enterprises do not reach growth through optimization because they optimize haphazardly without a plan or a goal.

Conclusion

Optimization takes place in many areas and the results are validated with A/B testing. But, unfortunately, there are plenty of influencing factors in the form of these testing errors that sink the ROI for the measures– entirely unnecessarily. Anyone who masters these five topics alone and composes a business plan can count on sustainable and effective growth.

Start structuring your Experiments today!

Try iridion for free as long as you want by signing up now!

Create your free Optimization Project