Object Store Apps: Cloud Native’s Freshest Architecture

Perhaps even for the first time since 2006 when Amazon Web Services had launched its Simple Storage Service (s3), object storage is hot again.
An increasing number of start-ups and end-users find that using cloud object storage as the persistence layer saves money and engineering time that would otherwise be needed to ensure consistency.
Cheap object storage fuels the recent open data format war between Delta Lake and Apache Iceberg, as both rely on object stores to allow customers build out the much requested open data formats for analysis.
And object storage was also the secret behind the success of WarpStream, which used the technology to pioneer a new, much more cost-effective way of running Kafka.
Chief Kafka distribution provider Confluent took notice and bought the company. And Confluent competitor Redpanda also uses the technology in a completely different way, to speed Kafka transactions through a tiered architecture.
“People have been asked me, like, why are you doing a talk on object storage, and why now? And isn’t that a bit boring?” admitted Docker CTO Justin Cormack, in a talk, entitled “Object Storage Is All You Need,” at KubeCon+CloudNativeCon NA 2024.
“But what’s happened in the last few years is that people have started building real applications using object storage as the only back end. There’s been a lot of these, and they’ve been really interesting,” Cormack subsequently told TNS.
Start-ups are taking advantage of object storage as a persistent layer upon which they can build their applications. This approach sorts out all the tricky engineering issues such as managing concurrency and state, not to mention backups and redundancy.
Object storage means “infinite cheap storage,” he told TNS. And that is perfect for cloud native development. “Anything that is bounded is annoying for developers.”
It’s also been used to build databases, observability platforms, virtual disks and who knows what else?
“There’s a solution for almost every type of storage that you may need that uses object storage,” added Keith Pijanowski, an AI solutions engineer from MinIO, which offers an S3-like open source object store file system for enterprise use.
High Latency, High Throughput
When coming up with the idea of S3, Amazon head Jeff Bezos specified that he wanted “a malloc
for the Web.” This C programming language call allows provides a way for developer to allocate memory with a single line. The idea with S3 is that it should make it just as easy for cloud developers to easily allocate storage.
That said, S3, and object storage in general, has some performance characteristics that developers should know about. In other words, you have to know how to use it.
Object storage can infinitely scale, or if you are using an in-house version such as MinIO it is limited only by the physical storage it runs on.

Justin Cormack
Architecturally, an object store is basically a key value store accessible by an http API. But it is non-POSIX. It is not a full-featured file system. It has some file listing capability but few other interface commands. The biggest downside is that object files can not be updated incrementally. One change and the whole file must be updated.
Highly parallel, objects stores nonetheless have some latency, though they can be scaled up through greater parallelism.
You can run a single connection on AWS that from that will achieve 5 Gib throughput, but you can have as many connections as you want.
With a service like AWS S3, objects stores are replicated across availability zones, so you get replication, consistency and backup for free. As far as reliability, it offers 99.999999999% (“eleven nines”) of uptime, for up to 280 trillion objects.
And, object storage is by and large, cheap. Cheap to store data, cheap to network. Because it is simple to use, you can can write a really simple program that uses it.
“Having simple primitives allows you to invest into making them reliable. That gives you the assurance that you can safely build applications on them,” Cormack said.
Good News for Databases
Over time, the S3 API became the de facto standard interface for object storage, paving the way for MinIO and Ceph and other implementations.
The original use case for object storage seemed to be chiefly for Web sites, which by and large were updated page by page. But further uses cases pushed the need for parallelism, which meant who writes what must be managed.
One favored approach to manage concurrency was to use a database, such as DynamoDB, to manage the concurrency. This is the approach that Docker takes with Docker Hub.
“It gives you a mutual exclusion concurrency primitive you can use to have multiple applications write something.
One big jump for app developer ease was the 2024 addition of a new command to the S3 interface to write an object if it doesn’t already exist (PUT with IF-NONE-MATCH), which is now supported by AWS, Minio, and Cloudflare R2.
An application that could use this command to create as series of ordered files 001, 002, 003, etc. If there are more than one server writing files, then the files are still numbered incrementally on a first-come-first-serve basis.
In effect, these sequential files can be a log, a fundamental primitive for a database. This allows you to use the object store itself rather than a separate database to concurrency.
Open Data Wars
“You can build any storage system from the logs,” Cormack said. These could be write-ahead logs, or commit logs, or transaction logs,
Apache Delta Lake uses this primitive to create acid-compliant data storage, and Apache Iceberg relies on object storage as well. Last year saw a big rush of customers to move to one or the other of these open formats, as a way to keep their data in an open format. Both use Parquet files for tables.
Phil Eaton offers an example of how to build a Delta Lake-based database using the Go programming language.
“There’s a whole lot of nice properties that these things have that are fun for the developer,” Cormack said.
‘Bring Your Own Bucket’
WarpStream is a Kafka data store compatible implementation built on S3, one that promises “zero disk” management and infinite scalability. The company was recently acquired by Confluent, who subsequently offers it as its “Bring Your Own Cloud,” service
RedPanda touts faster throughput than Confluent’s premium Kafka cloud offering, in part by using SSDs as a caching layer and then offloadingg the final results to S3.
WarpStream went the other way, offering a slower service but one that was way less expensive to run.
Turns out, many users were didn’t mind a bit of latency for lower operating costs. Many apps just didn’t need the super-low latency.
Secret sauce: WarpStream was built on S3.
With AWS, you don’t have to pay for traffic across availability zones, so WarpStream’s genius was to use this to cut the costs of sending Kafka messages across availability zones, completely eliminating the cross-zone fees that using something like EBS would cost.
“It’s was the cheapest possible way of running Kafka,” Cormack noted.
Also, WarpStream offered a simple service model: Customers would manage their own S3 buckets, and WarpStream provided a control plane.
“It’s operationally very, very simple to manage because it’s just object storage and statements compute,” Cormack told TNS.
Other Innovative Uses of Object Storage
A look around the ecosystem finds many other innovative uses of object storage.
An open source project for virtual disks uses the technology. To build scale-out disks, the researchers use a SSD for cache and then changes are sent to S3 as logs.
Theoretically, it could be a cheaper alternative to Amazon Elastic Block Storage (EBS).
SlateDB, a key-value store, takes advantage of this architecture. It has high write latency but offers full persistence and is infinitely scalable.
TurboPuffer is a vector database built on object storage.
Because the object store does all the work of ensuring reliability, developers can just attach ephemeral components to build cloud native applications.
In the cloud native world, reliability is usually defined as having lots of pods. But if you build a database, ensuring persistence becomes a major part of the job, “but all those things are done for you if you use an object store,” Cormack said.
“You end up with a very different reliability story.”
Bring Your Own Bucket
Matt Klein, of the Envoy proxy fame, created BitDrift, for cutting the costs of observability through object storage with a “bring-your-own-bucket‘ approach.
This has also been called, more genially, “bring-your-own-cloud,” a term that WarpStream uses, as does Buildkite for its continuous delivery platform, which speeds testing but runs them in parallel.
“You can’t run one test after another. Otherwise, it would take weeks, months, years, and even some cases, to run tests sequentially. So you have to paralyze you have to run them concurrently,” Buildkite CEO Keith Pitt told TNS.
You Don’t Need a Raft
Further possibilities abound.
It can be used to do leader elections for cloud native apps.
“What you notice is that you don’t have to build a lot of distributed system primitives because you already got this consistent backend you can use,” Cormack said. “You got a concurrency primitive you can really build things on top of it.”
Any of the Landscape technologies of the Cloud Native Computing Foundation that require a Raft implementation (Vitess, etcd) for load balancing and concurrency would be good candidates for rewriting to run on object storage, Cormack noted.
“You don’t have to do those things if you can avoid them,” Cormack said. “Building a persistence layer on object storage meaning you don’t have to do that.”
“It’s a really attractive option if you building something that involves data, which is pretty much everything,” Cormack said.
Object Storage at Home
And while many of the applications were built on Amazon S3 — or other cloud object stores such as DigitalOcean Spaces, Azure Blob Storage and Google Cloud Storage the architecture can also be used in-house, MinIO’s Pijanowski noted.
MinIO has partnerships with Dremio and StarBurst, offering customers to easily set up data warehouses on premises with unlimited scaling capability. For alternatives to cloud Kubernetes deployments, the company has also partnered with VMware for those wishing to build private deployments with VMware Cloud Foundation.
The object store, once a somewhat niche technology, is now becoming a fundamental building block of the modern cloud, leading to a new wave of cloud native applications that are faster to develop, easier to maintain, and more cost-effective to run.
Cormack’s full talk can be enjoyed here: