SLOs for Mobile Apps: Hard Truths to Consider

Mobile is an incredibly unpredictable environment, with many different variables affecting performance that make each individual session unique.

Sep 26th, 2024 11:00am by Virna Sekuj

Featued image for: SLOs for Mobile Apps: Hard Truths to Consider

Image from TippaPatt on Shutterstock

Building and monitoring service-level objectives (SLOs) are an essential part of the modern DevOps practice. SLOs not only give engineers a window into their system’s health and performance, but they help teams effectively prioritize between reliability and feature work depending on pre-set tolerances for failure.

When it comes to adopting this practice, which works so well for backend infrastructure, for mobile applications, things get a bit more complicated.

Collecting data about a mobile app is very different from collecting data about a server, cluster or networking component. Mobile is an incredibly unpredictable environment, with many different variables affecting performance that make each individual session completely unique.

This means that, as you think about designing effective SLOs from mobile, you’ll have to take into consideration a number of hard truths.

Delayed Data

The first of these is what to do about delayed data.

The reality of collecting client-side data means that sometimes there will be a delay in receiving it server-side. This can be due to the device losing connectivity, the app crashing, or user patterns such as “offline mode.” Sometimes, data is lost entirely, and sometimes, you have to wait a few minutes, a few hours or even multiple days to receive data from devices out in the wild. Additionally, data may be out of order when it’s received on the server. Because of this, it’s important to use an observability tool that lets you visualize app metrics according to the time at which events occurred for the user, not when they were received by the server. You must also consider this event-time mapping when designing effective mobile SLOs.

What This Means for Your SLOs

You’ll have to consider your aggregation and analysis windows to control for this. Either use longer aggregation windows to collect as much of your delayed data as possible or opt for smaller windows with some “delay” built in to offset the impact of when your data is flowing in. That way, you get a more realistic picture of user activity that occurred within the time period.

As an example, this graph shows the data delay profiles for four mobile apps from Embrace customers. While some send the majority of data within a minute, other apps have significant amounts of data delayed by over a day. One customer we recently analyzed had about 25% of data delayed by two or more days.

This data is pulled from four apps with very different data delay profiles. A few apps send 85%+ of their mobile telemetry within a minute, while some take a day or longer to hit that threshold.

Dynamic, Highly Changing Runtime Environment

Secondly, you’ll have to think about how you control for a runtime environment that’s dynamic and often very unpredictable.

Mobile app users have potentially unstable, changing network connections, and they’re engaging in behaviors like background/foreground switching of their apps that lead to operational disruptions. This means that monitoring and interpreting SLOs is different in mobile vs. the more predictable and consistent server environment.

What This Means for Your SLOs

In your service-level indicators (SLI) implementation phase, consider how you want to control for different user paths. This might look like instrumenting different versions of a user flow via spans and span events so you can be aware of an activity that has ended due to behavioral variables, like the user abandoning or switching from foreground/background. Then, you may choose to exclude those particular activities from the total population you use to calculate your SLO.

High Cardinality of Data

One of the biggest challenges in monitoring and analyzing mobile data is the sheer volume of it, and this makes designing good SLOs hard.

The number of variables influencing the mobile system is far higher than those influencing backend infrastructure, and many of them are out of the engineer’s control. At any given time, your app might be running on thousands of different devices with numerous operating systems and versions. You’ll probably have a long tail of app versions in production, with many users running several updates behind. The high cardinality of mobile data you collect and the many ways its variables combine and influence each other make it harder to aggregate metrics for SLOs.

This data is pulled from five apps with very different adoption patterns for newer app versions. Notice how some apps have more than 85% of users on the latest app version, while some require more than seven versions to reach that threshold.

What This Means for Your SLOs

You may choose to set separate SLOs for separate populations or only apply an SLO to a certain conditional group. For example, you may only be concerned with hitting an SLO for your most recent app version as you cannot realistically commit maintenance work to older versions that will be phased out anyway. Or, you may have stricter SLOs for iOS vs. Android, or for users in certain geographic regions with better infrastructure.

Resource Limits

While mobile apps can generate high volumes of telemetry, the space and processing power at your disposal are very much finite.

Building for mobile means coming up against a lot of limitations that are inevitable when running code on a small, portable device. Mobile system resources are scarce and are limited by both hardware and the OS. High resource usage, like excessive memory consumption, can have unexpected effects on the end user experience because your app is in constant competition with an unknown number of other programs and activities, something you’ll need to consider when deciding which SLI measures map to your SLOs.

What This Means for Your SLOs

Consider the population of users that you actually evaluate for your SLOs, as not every single device might matter. Unlike backend SLOs, which evaluate largely uniform machines with uniform specs, mobile performance is influenced by a lot of variables that are outside of engineers’ control. Therefore, you may explicitly define the population that’s truly relevant for your SLO in your SLI implementation.

For example, an SLO for startup time might be impossible to achieve if it’s evaluated on every device. This is especially true on Android, which has tens of thousands of device types with vast differences across compute, memory and connectivity. Additionally, consider how important it is to your business to spend time and resources on improving SLOs for groups that may not even be target customers. In the prior example, Android users on lower-end devices may need a total hardware upgrade to experience your product in the way it’s intended, putting them outside of your realistic target user market.

This data is pulled from 10 apps and shows the number of device models running the app within a single day. The Android apps have a larger range (from 2,000 – 7,000 models), while the iOS apps hover around 100 models.

User Choices and Behaviors

The end user is at the heart of mobile apps, and their patterns of behaviors and choices directly influence what your observability tooling reports to you.

For example, users may choose to abandon key processes in your app before they can successfully complete. Depending on the sophistication of your tooling, it may not recognize a failed process that’s affecting your SLO from a healthy process that the user chooses to quit of their own accord. Or it may register a process as taking less to complete than it realistically would, all because the user cut it short.

What This Means for Your SLOs

Use tooling that lets you separate the population of user abandons from your overall sessions and specify this method in your SLI implementation. Otherwise, you may be artificially inflating or deflating your measures based on user choices.

Activity Count vs. Session Count vs. User Count

In the world of mobile, the unique instances of one activity often do not translate to the number of sessions or number of users experiencing that activity. For example, an SLO around successful search queries for your app might be reporting that 2% of queries — let’s say 10,000 — are failing. With that single metric, you can’t be sure whether those 10,000 failed queries are happening across 10,000, 5,000, or 1,000 unique sessions. Similarly, you don’t know if it’s just 100 frustrated but undeterred users repeatedly getting failed queries because they haven’t done a software update. And so your monitoring system sounds the alarm and your engineering team dedicates many working hours toward fixing an issue that, unbeknown to them, is affecting a tiny subset of users whose problem wasn’t directly related to your product.

The inverse can happen without proper user insight, as well. You may have a large number of users, for example, who are prevented from entering a flow due to another malfunctioning condition or dependency. Their attempts never register as an “event failure” because they could never get far enough through the flow. The SLO ends up artificially high because it only takes into account the other group of users who are able to complete successfully, obscuring a legitimate issue with your app’s performance.

In cases like these, SLOs don’t seem to be doing their job of helping teams prioritize their work very well.

What This Means for Your SLOs

In some cases, you may want to set two SLOs around a single critical user activity. The first, more traditional SLO captures the instances of failure, latency, or errors (for instance, 99.5% of search queries are successful). The second SLO captures the user impact of the activity (95% of users are able to complete a search query successfully). This approach means you’ll check two metrics to make sure you’re appropriately prioritizing your response to a failed SLO based on its relative impact to your users and your business.

Boiling It Down: End-User Experiences Are Paramount

Ultimately, the end-user experience is even more critical to think about when building SLOs for mobile vs. for the backend.

Every time you look at a performance dashboard, you are observing millions of independent app instances operated by individual users trying to achieve something with your service.

A bad P90, when you’re monitoring backend services, may have little to no measurable impact on users. But in mobile, a bad P90 directly translates to 10% of your users having a bad experience.

Remember that behind every poor measurement is a real customer, staring at their phone, getting frustrated.

Fortunately, by adopting some of the tips we discussed above, you can ensure that your mobile SLOs are effectively warning you about impactful issues before they start frustrating your users at scale. If you’d like a more in-depth exploration of mobile SLOs, including examples and templates, check out Embrace’s mobile SLO guidebook.

Virna Sekuj is a product marketer at Embrace. She has nearly 10 years of experience in product management, marketing and research analysis. Prior to working at Embrace, Virna worked at Bose, Onside Sponsorship and GWI. In her time with Embrace,...