InfoQ Homepage News Uber Completes Massive Kubernetes Migration for Microservices and Large-Scale Compute Workloads

Uber Completes Massive Kubernetes Migration for Microservices and Large-Scale Compute Workloads

May 24, 2025 3 min read

Write for InfoQ

Feed your curiosity. Help 550k+ global
senior developers
each month stay ahead.Get in touch

Listen to this article - 0:00

Uber has successfully completed a large Kubernetes migration, transitioning its entire compute platform from Apache Mesos to Kubernetes across multiple data centers and cloud environments. The ride-sharing giant's engineering teams have detailed their comprehensive journey in a series of technical blog posts, revealing the challenges, solutions, and lessons learned from migrating thousands of microservices and large-scale compute workloads.

The migration represents a fundamental shift in Uber's infrastructure architecture, affecting thousands of services that power everything from ride-hailing to food delivery across global markets. The company's previous compute platform, built on Apache Mesos, had served Uber well during its rapid growth phase but presented limitations as the organization evolved toward a more cloud-native approach.

"This migration was not just a technology change, but a complete reimagining of how we operate our compute infrastructure," explained Uber's engineering team. The project spanned multiple years and required careful coordination across numerous engineering teams to ensure zero-downtime transitions for critical services.

Uber's approach to the Kubernetes migration was methodical and risk-averse, prioritizing service reliability above migration speed. The engineering teams developed a sophisticated migration framework that allowed for gradual service transitions while maintaining full backward compatibility with existing Mesos-based services.

The migration strategy centered on several key principles:

maintaining service reliability throughout the transition
ensuring seamless integration with existing tools and workflows
establishing robust monitoring and observability capabilities in the new Kubernetes environment

The teams implemented a dual-stack approach, running services simultaneously on both Mesos and Kubernetes during transition periods to minimize risk.

One of the most significant technical challenges involved adapting Uber's extensive suite of internal tools and platforms to work with Kubernetes. This included reimplementing deployment pipelines, monitoring systems, and service discovery mechanisms that had been tightly integrated with the Mesos ecosystem.

Uber migration ecosystem from mesos to kubernetes

Beyond migrating standard microservices, Uber faced the complex challenge of transitioning large-scale compute workloads that power critical business functions including machine learning model training, data processing pipelines, and analytics workloads. These compute-intensive applications presented unique challenges due to their resource requirements and performance sensitivity.

The engineering teams developed specialized solutions for handling these workloads in Kubernetes, likes modeling DSW sessions as a Custom Resource Definition (CRD) in Kubernetes, optimized networking configurations, and enhanced scheduling capabilities. The Uber engineers also implemented sophisticated resource allocation mechanisms using Federator, a cluster federation layer that provides an abstraction over ‌Kubernetes batch clusters. Thanks to this, the large-scale batch jobs could coexist efficiently with real-time services without impacting user-facing applications.

Kubernetes clusters usage without and with federation

The migration journey was not without significant technical hurdles. Uber's engineering teams encountered challenges related to networking complexity, resource management at scale, and maintaining performance benchmarks across different infrastructure paradigms. The company's global presence added additional complexity, requiring solutions that worked consistently across multiple regions and cloud providers.

One particular challenge involved maintaining Uber's strict latency requirements while transitioning services to the new platform. The teams implemented comprehensive performance testing and gradual rollout strategies to ensure that service quality remained consistent throughout the migration process.

The engineering teams also had to address cultural and operational challenges, including training hundreds of engineers on Kubernetes concepts and updating development workflows to align with cloud-native practices.

The completed migration has delivered substantial benefits across multiple dimensions. Uber reports improved operational efficiency, enhanced developer productivity, and better resource utilization across their infrastructure. The move to Kubernetes has also positioned the company to better leverage cloud-native technologies and practices, enabling faster innovation and more flexible deployment strategies.

The new platform provides enhanced scalability capabilities, allowing Uber to more efficiently handle traffic spikes and seasonal variations in demand. Additionally, the migration has simplified Uber's infrastructure management, reducing operational overhead and enabling teams to focus more on product development rather than platform maintenance.

Other big companies have also migrated their core infrastructure to Kubernetes: Figma migrated the core services to Kubernetes in 12 months, and CERN migrated the CMSWEB Cluster to Kubernetes. These examples as well as Uber's successful Kubernetes migration serve as valuable case studies for other large-scale organizations considering similar transitions. The company's detailed documentation of their journey provides insights into best practices for enterprise Kubernetes adoption, particularly for organizations operating at significant scale.

About the Author

Claudio Masolo

Show moreShow less

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

Uber Completes Massive Kubernetes Migration for Microservices and Large-Scale Compute Workloads

Write for InfoQ

About the Author

Claudio Masolo

This content is in the DevOps topic

Related Topics:

Related Editorial

Related Sponsored Content

Popular across InfoQ

The InfoQ Newsletter