10 Years of Kubernetes: Past, Present, and Future

Matt Butcher reflects on how things started, how Kubernetes marched to maturity, and how it displayed potential to expand into the WebAssembly movement.

Jun 12th, 2024 10:00am by Matt Butcher

Featued image for: 10 Years of Kubernetes: Past, Present, and Future

Image by Bianca Van Dijk from Pixabay.

This year Kubernetes celebrates its tenth birthday. As a developer who has been around the community since its early days, I found occasion to reflect on how things started, how Kubernetes marched to maturity, and how it is now displaying potential to expand into the WebAssembly movement.

And since I’m a Swiftie, I broke these into eras in honor of my favorite musician.

Era 1: When Kubernetes Was Simple

First, there was Borg. I spent a brief time at Google in the early 2010s before leaving to join a PaaS startup called Deis. I was in Google’s Nest unit (the one that makes thermostats and fire alarms). Unlike the rest of Google, Nest was, at the time, orchestrating its VMs using Apache Mesos and Apache Zookeeper. However, I also built an internal app using AppEngine. Through that, I gained exposure to Google’s internal LXC container scheduler. Borg was powerful. It scaled out globally. And it took hundreds of engineers to maintain. At one point I contemplated a move to one of the Borg teams. Instead, I left for Deis to build an open source alternative to then-titan Heroku.

Deis was an early entrant into the Docker container space. Members of the team worked on Docker networking and made a host of contributions to CoreOS’ tools. But we were on the hunt for a powerful orchestration system that could simplify our PaaS’s scheduling code. When Google open sourced Kubernetes, I was thrilled. Here were all of the perks of Borg, but in a much simpler package.

Simple. That was the word I used back then to describe Kubernetes.

It was simple because rather than describing a lifecycle or a set of procedural instructions for running something, I simply declared what should be running. Kubernetes took that declaration and made it so.

It was simple because rather than requiring coding, I just wrote a few dozen lines of YAML (back then, it really was only a few dozen). Then I just used a simple CLI to upload that YAML.

And it was simple because there were only a few kinds of Kubernetes objects. Pods, ReplicationControllers, Services, and Secrets were the only ones I really needed on a daily basis.

At Deis, we replatformed our entire PaaS on Kubernetes, along the way building Helm and creating the Children’s Illustrated Guide to Kubernetes. In the container ecosystem, we were all moving fast (and breaking things). I still remember the day the Kubernetes team announced it was possible to run a 1,000 node cluster. The first KubeCon fit in a single hotel ballroom. CNCF was formed as a neutral steward for Kubernetes IP. When Brendan Burns left Google to join Microsoft in order to create a dedicated Kubernetes team, I knew for sure that the ecosystem had crossed an important threshold.

Kubernetes was about to catapult to success.

Era 2: Growing Pains and Growing Gains

While in some sense Kubernetes’ early sparseness and simplicity was attractive, it was also limiting. To run microservices in production, additional features were needed. ReplicationControllers gave way to ReplicaSets, which facilitated Deployments, DaemonSets, StatefulSets, and other higher-level control features. ConfigMaps joined Secrets, and the Volume system grew and matured. Autoscalers, Jobs, and RBAC made their way into the Kubernetes core. Third Party Resources gave way to the more flexible Custom Resource Descriptions (CRDs). And CRDs have become the standard way of extending Kubernetes’ resource (kind) system.

The once-simple system got complicated fast. But not without reason. If Kubernetes was going to democratize orchestration at the level once reserved only for Borg, it would need to facilitate a wide variety of configurations, use cases, and security policies.

Not all of the changes were technical. The ecosystem around Kubernetes shifted as well. CNCF grew into a full-fledged and broadly scoped foundation. From the TOC to SIGs and then TAGs, and on to the careful structuring of incubation, sandbox, and graduation for projects, CNCF added the rigor the ecosystem needed. As for Helm, we “left” the Kubernetes project (we used to be a sub-project) and became one of the earliest CNCF sandbox projects, and that enabled us to have our own release cadence, build infrastructure, and governance policies. This was all facilitated by CNCF’s burgeoning process. Some turmoil in the container runtime space led to the creation of Open Container Initiative (OCI), which has blossomed into a standards body that protects the core of the container specifications.

In the moment, each of these changes felt tumultuous. And we were all aware that Kubernetes had a long way to go if it were to win what, at the time, we called “the orchestrator wars.” Apache Mesos, CoreOS Fleet, Docker Swarm, and HashiCorp Nomad were all giving Kubernetes a run for the money. Even while Mesos was the incumbent, Kubernetes outpaced it (and the others) both on the technical side and in community growth. The idea of declarative provisioning struck a positive chord and soon Kubernetes was showing up in one company after another.

Era 3: Stability vs. Innovation

Every successful open source project hits a moment when the pressure to develop features runs headlong into the need for stability. Early in a project, developers can make breaking changes in the name of improvement, and users are accepting, and even supportive, of this. At these early stages, we crave improvement even if it means a painful migration.

But as software becomes critical for the functioning of an organization, the tolerance for breaking changes dwindles. Stability is one reason, as breaking changes can cause costly downtime. Also, as more companies, teams, and projects buy into the technology, it becomes collectively expensive (in money and time) to rebuild all of the workloads for each new release of the software.

Kubernetes, Helm, and other project began prioritizing stability over velocity. And while this elicited plenty of grumbling from engineers who wanted to get their features merged and released, it fostered Kubernetes’ position in enterprises.

One downside to this approach, though, is that when a new feature is added, it must not break existing behavior. This is more complex. Kubernetes’ codebase grew in size, and the YAML manifest format increased became more verbose. Early in Helm’s development, we had hit an interesting limitation inside of Kubernetes: The maximum size for any one Kubernetes manifest was 1M. In Bill Gatesian fashion, I remarked, “Yeah, but nobody will ever need a Helm chart that is one megabyte! They’re just YAML files!” But as Kubernetes YAML grew in complexity, and as Helm’s capabilities also became more complex, we hit the 1M limit. We added chart compression, then hit the limit again a few years later. One day on LinkedIn, I saw an engineer list their title as “Helm Chart Developer.” In shock, I realized that even while I had built Helm believing it was a simple system, it, too, had grown into a tool whose sophistication required engineers to cultivate a specific skill set.

In 2019 I started to despair that Kubernetes was growing into a monster of a project that would soon become incomprehensible to even those who worked on it. Even if there was at one point danger of this happening, we seem to have safely traversed the problem. Thanks to the technical leadership, Kubernetes tacked a new course. Rather than focus on adding feature after feature, developers focused on building extensibility mechanisms. Solid CRD patterns emerged. The control plane became more flexible. And as enterprise features like policy controls solidified, they did so in a way that allowed various extension points. In short, Kubernetes thwarted internal complexity by allowing others to choose where they wanted to tolerate complexity in the form of add-ons.

This move saved Kubernetes. And it also set the stage for a future beyond containers.

Era 4: The Future Is Wasm

One of the key flex points of Kubernetes is the container runtime. Where once Kubernetes required the Docker daemon, it now requires only a scheduling system that accords with Kubernetes’ lifecycle expectations. Containerd, CRI-O, and other low-level container tools broke down the once-monolithic container runtime into functional units. This is the doorway to the future.

WebAssembly — abbreviated to Wasm (rhymes with “awesome”) — was originally a bytecode format designed to run inside web browsers alongside JavaScript. The promise was more powerful client-side coding. With Wasm, we’d be able to compile old C libraries and then access them from JavaScript. We’d be able to delegate high-performance features, like vector math, to languages better suited than JavaScript. And we’d get all of this inside of a secure sandbox that protected even the JavaScript runtime from malicious bytecode.

In 2018, my team at Microsoft was spending part of our time maintaining Helm, Brigade, and other CNCF projects, and another chunk of time trying to solve novel problems in the container ecosystem. One that we really wanted to address had to do with serverless functions (like AWS Lambda). Two things were clear to us:

The existing technologies underlying serverless functions were not ideal. Virtual machines and containers were both predicated on the design assumption that the guest workload would run for hours, days, or months. Startup time was not a huge consideration because startup was relatively infrequent
The most popular serverless function implementations (AWS Lambda, Azure Functions, CloudFlare Workers, and so on) were operated by large companies and ran outside of Kubernetes

That second point, we heard repeatedly at Microsoft, was an irritation to enterprises who had standardized on Kubernetes. As one platform engineer once remarked to me, “We have this nice tidy secured Kubernetes cluster hosting our server workloads, and this Frankenstein’s monster of one-off policy controls, services, and processes to support our AWS Lambdas.”

There is little risk (nor reason) that Wasm will in some way displace containers. WebAssembly’s virtues — fast startup time, small binary sizes, and fast execution — lend strongly toward serverless workloads where there is no long-running server process. But none of these things makes WebAssembly an obviously better technology for long-running server process that are typically encapsulated in containers. In fact, the opposite is true: Right now, few servers can be compiled to WebAssembly without substantial changes to the code (or to the WebAssembly runtime itself).

When it comes to serverless functions, though, WebAssembly’s sub-millisecond cold start, near-native execution speed, and beefy security sandbox make it an ideal compute layer.

If WebAssembly will not displace containers, then our design goal should be to complement containers. And running WebAssembly inside of Kubernetes should involve the deepest possible integration with existing Kubernetes features. That’s where SpinKube comes in. Packaging a group of open source tools created by Microsoft, Fermyon, Liquid Reply, SUSE, and others, SpinKube plumbs WebAssembly support directly into Kubernetes. A WebAssembly application can use secrets, config maps, volume mounts, services, sidecars, meshes, and so on. And this is possible out of the box. Not only that but containers and WebAssembly can run side-by-side in the same pod. One app can have, say, a long-running server process executing in a container with a handful of WebAssembly sidecars executing as serverless functions. Never before has there been a way to seamlessly blend serverless and server workloads in a single operational unit.

And because WebAssembly requires far fewer system resources, a single node in a Kubernetes cluster can run hundreds or even thousands of WebAssembly apps. While your own cluster may never need that many, you were certain to get a lot more bang for the buck out of your Kubernetes cluster with these svelte apps, as the amount of computing power you need for your cluster (and consequently the cost of your cluster) plummets.

What is impressive about this story is that Kubernetes itself is the reason that WebAssembly is poised for massive success. The efforts over the last few years to make Kubernetes extensible, pluggable, and highly configurable pay off when even a brand new runtime can be slotted in alongside the default container runtime!

Leaning Beyond Purpose

As I’ve remarked before, a cursory look at software teaches us something profound about the evolution of projects. Good software projects live up to their original design intentions. Great projects exceed them. TCP/IP was intended only for the US government to communicate. HTML was designed to transmit physics papers. JavaScript was for popping up browser alerts and communicating with Java Applets.

From its inception, the Kubernetes community had a clear understanding of what we were collectively building:

Kubernetes is an open source orchestration system for Docker containers. It handles scheduling onto nodes in a compute cluster and actively manages workloads to ensure that their state matches the users declared intentions. Using the concepts of “labels” and “pods”, it groups the containers which make up an application into logical units for easy management and discovery.

The project certainly lived up to that early goal, and has now moved beyond.

Kubernetes was a Docker orchestrator. And WebAssembly was a browser runtime. Both of these are outgrowing their original design right before our eyes. Each is showing itself to be a great technology. The confluence of the two will lead to new horizons in distributed computing.

Matt Butcher is the co-founder and CEO of Fermyon, the WebAssembly in the Cloud company. Matt is one of the original creators of Helm, Brigade, CNAB, OAM, Glide and Krustlet. He has written and co-written many books, including “Learning Helm”...