Recently, during a video conference call I was asked to tap on a button not available on the app I was using. Then I was asked to update the app from the App store and I confirmed that I had the latest version available. I told them that it’s probably an AB test. Turns out, it was an alien term for most of the attendees from a non-tech background. Appreciating their curiosity I took the next few minutes to explain them what it meant. Being constrained with the time and audience I didn’t explain it completely and got away by saying that it needs a separate discussion to get into the details. Probably, a blogpost is a good idea to reach a bigger audience, I’ll try to keep it as non-tech as possible.

What is an AB test anyway?

Imagine you and I started an icecream parlour. Unfortunately, our business is not doing well and we decide to add a new Chocolate flavor in our offerings. You like the Belgian Chocolate icecream and believe that the customers will love it too. Me on the other hand thinks that the American Chocolate will sell more. Because we are low on funds we don’t want to place a bulk order for any of these chocolates and end up making a loss. Instead, we decide to buy small amounts of both American and Belgian chocolates to run an experiment of selling both icecreams for 7-days. Based on the results of this experiment we’ll get to know the chocolate flavor our customers prefer and eventually we’ll place bulk order for that one.

This experiment which helps us to determine the preferred chocolate flavor of our customers is an example of AB tests.

AB testing is a process of simultaneouly experimenting two variants of a product against a metric that defines success to determine which variation to use in the real-world.

Similar conflicts arise in the software development process, let me walk you through a normal feature development process in a product company and how it reaches you from the App Store.

Feature Development

Product managers thinks of building a new feature or an improvement to an existing feature; goes to the designer gets a prototype ready shows it to the client and the client proposes a different solution or completely rejects it saying it won’t work. This is a classic problem of HIPPO(HIghest Paid Persons Opinion) and the only way to avoid this is with AB tests, because data speaks for itself. This data can only be collected when the variation for the existing feature is developed, manager gets approval to build the proposed feature and gathers a team of rockstar developers to work on it. The developers work on the new feature keping the AB testability in mind and make it ready to deploy on production.

Feature Deployment

Releasing the feature can be as simple as clicking a button but if your product is an app distributed from an app store then you can also use Phase Release utility provided by most of the stores. With this utility you can set what percent of users should get this update, if you set a value of “5” then app store will release it to only 5 in every 100 users of the app. You can treat it as a safety net for your app, if anything blows up in production it will affect only 5% users the rest of the 95% won’t even realise. Please note that this only works for users who have auto-update enabled, if any user goes and downloads the app from the store directly there’s no one stopping them to get the latest app.

Phased releases are not limited to apps, the technical term for it is Canary Release and it can be used for anything that can be released.

Now that the app with the new “Feature” has reached our users, how does it decide which variant to show?

We finally AB test!

A software is dumb and does whatever its instructed to do. The developers build the feature in an AB testable way so they put the necessary code to toggle this feature for certain users matching a criteria. A match is validated usually with the help of a service like Firebase Remote Config or an in-house Feature Configuration service. An AB testable code can be something like this:

let isNewFeatureEnabled = ConfigurationService.shared.getConfig(for: "NEW_FEATURE_ENABLED")

if isNewFeatureEnabled {
    self.performNewFeatureActions()
} else {
    self.performOldFeatureActions()
}

The ConfigurationService relies on the analytics data(more on this below) you share with the app. Behind the scenes, this service pre-computes a list of users who satisfy the match criteria and the code above just checks the presence of this user on that list.

Analytics Data

All the apps that we use track our activity in some way. In addition to the device specific details like operating system, device type, geo-location; information is also shared in the form of events like opening a screen, tapping a button, adding money to the digital wallet, liking a picture, opening and watching a video, sending a message etc. Simply put, whatever you do inside an app the developer knows because you quickly “Agree” the Terms and Conditions of using the app. Again, there are services available in the market which helps you track the user activity; Mixpanel, CleverTap, Firebase, Google Analytics are the one’s that I have used.

With the help of this data the product team can create user buckets to be consumed by the ConfigurationService. These buckets can include users in particular geography, total amount spent, users who liked a picture or watched a particular video etcs. This way your own behavior on the app determines what you see on the app and offcourse, it can be different for two different users.

Measuring the Feature Performance

The analytics data that has been shared with the app contains information about the newly added “Feature” also. Looking at this data, product managers and data analysts/scientists perform some magical calculations and judge the behavior of the user with respect to this feature and check the results against the metric that defines success. If the results are good and everything works as expected the feature can be released to more users.

If the new change is not performing upto the expectations, managers look at user drop-points and think of a strategy to avoid the drop-point or make it easy to use.

Once the AB test is complete and the changes are permanent, it is the developers responsibility to cleanup the toggles in the next releases. It should be removed from the code and the configuration system to improve the maintainability of the codebase. Some services like LaunchDarkly periodically reminds you to review the toggles and cleanup that are not being used.

It is worth mentioning that AB testing is not just for measuring performance of new features(the icecreams example), it can be used to measure the performance of a change(like UX improvements) compared to the existing implementation. Infact, in the real-world AB testing is used more for the latter case.

Alternatives for B2B products?

AB testing makes a lot of sense for B2C products where the business can exploit the user provided data to make decisions. But with B2B products where enough data points cannot be provided, using an AB test doesn’t help much and in that case we rely on the feedback of super users and additional alpha/beta users.

Though, I explained everything here in the context of feature development, a ConfigurationService and phased releases are not limited to this only. Product teams use it for use-cases like reducing risk, load testing, easy rollbacks etc. These are amazing tools and using them properly can help in improving the user experience and also in growing the business. Based on your business use-cases you can identify how AB testing can help you with the growth of your company.

I would highly appreciate your feedback and questions on this. Please let me know in the comments!

References