AutoMQ is a cloud-native Kafka distribution that replaces Apache Kafka's local disk storage with S3-compatible object storage. It is a fork of Apache Kafka that maintains 100% API compatibility while introducing a stateless broker architecture for elastic scaling and significant cost reduction.
AutoMQ transforms Apache Kafka from a shared-nothing architecture (where each broker stores partition data locally on disk) into a shared-storage architecture (where all brokers access partition data from S3). This fundamental change enables:
The codebase structure reflects this hybrid nature:
clients, core, streams, connect) remain largely unchangeds3stream module) and broker lifecycle managementSources: README.md1-161 build.gradle841-1039
AutoMQ's architecture consists of four main layers: client APIs, broker processing, storage, and control plane. The key innovation is in the storage layer where S3 replaces local disk storage.
Sources: build.gradle841-1039 core/src/main/scala/kafka/server/KafkaServer.scala1-900 s3stream/src/main/java/com/automq/stream/s3/S3Storage.java1-1400
The core difference between AutoMQ and traditional Kafka is the storage layer architecture. Traditional Kafka uses a shared-nothing design with local disks, while AutoMQ uses a shared-storage design with S3.
| Aspect | Traditional Kafka | AutoMQ |
|---|---|---|
| Storage Architecture | Shared-nothing (local disk) | Shared-storage (S3) |
| Broker State | Stateful (stores partition data) | Stateless (no local partition data) |
| Data Durability | Multi-broker replication (RF=3) | S3 durability (99.999999999%) |
| Scaling Time | Hours (requires data movement) | Seconds (no data movement) |
| Cross-AZ Traffic | High (producer, consumer, replication) | Zero (S3 handles cross-AZ) |
| Storage Capacity | Limited by local disk size | Unlimited (S3 scales automatically) |
| Primary Storage Classes | LocalLog, LogSegment in kafka.log | S3Storage, S3Stream in com.automq.stream.s3 |
| Replication Mechanism | ReplicaFetcherThread copies logs between brokers | S3 built-in redundancy |
| Storage Location | Each broker: /var/kafka-logs/topic-partition/ | All brokers: s3://bucket/objects/ |
Traditional Kafka Write Path:
SocketServer → KafkaApis → ReplicaManagerLocalLog → local diskReplicaFetcherThreadAutoMQ Write Path:
SocketServer → KafkaApis → ReplicaManagerS3Storage → WriteAheadLog (local)S3Storage asynchronously uploads to S3 via S3OperatorSources: README.md104-133 core/src/main/scala/kafka/server/KafkaServer.scala320-333 s3stream/src/main/java/com/automq/stream/s3/S3Storage.java94-190
The s3stream module (s3stream/ directory) contains AutoMQ's S3-based storage implementation. This is where the core differentiation from Apache Kafka occurs.
| Class | Module | Purpose |
|---|---|---|
S3Storage | s3stream | Main storage engine coordinating writes/reads |
S3Stream | s3stream | Per-stream abstraction with append/fetch APIs |
S3StreamClient | s3stream | Client for creating and managing S3 streams |
WriteAheadLog | s3stream | Local WAL for durability before S3 upload |
LogCache | s3stream | In-memory cache for recent appends |
S3BlockCache | s3stream | Read-through cache for S3 data blocks |
DefaultS3Operator | s3stream | AWS SDK wrapper for S3 operations |
ObjectManager | s3stream | Manages S3 object lifecycle and metadata |
Sources: s3stream/src/main/java/com/automq/stream/s3/S3Storage.java94-190 s3stream/src/main/java/com/automq/stream/s3/S3Stream.java74-125 s3stream/src/main/java/com/automq/stream/s3/S3StreamClient.java1-100
AutoMQ's codebase is organized into Gradle modules. Here's how major components map to code locations:
| Component | Module | Primary Package | Key Classes | Purpose |
|---|---|---|---|---|
| Client APIs | clients | org.apache.kafka.clients.* | KafkaProducer, KafkaConsumer, KafkaAdminClient | Standard Kafka client APIs |
| Broker Server | core | kafka.server | KafkaServer, BrokerServer | Broker lifecycle and coordination |
| Request Handling | core | kafka.server, kafka.network | KafkaApis, SocketServer, RequestChannel | Network I/O and request processing |
| Replication | core | kafka.server | ReplicaManager, Partition, ReplicaFetcherManager | Partition management and replication |
| Log Management | storage, core | org.apache.kafka.storage.internals.log, kafka.log | LogManager, UnifiedLog, LocalLog | Log lifecycle and segment management |
| S3 Storage | s3stream | com.automq.stream.s3 | S3Storage, S3Stream, S3StreamClient | AutoMQ's S3-based storage engine |
| S3 Cache | s3stream | com.automq.stream.s3.cache | LogCache, S3BlockCache | Write buffer and read cache |
| S3 WAL | s3stream | com.automq.stream.s3.wal | WriteAheadLog, RecoverResult | Local write-ahead log |
| S3 Operations | s3stream | com.automq.stream.s3.operator | DefaultS3Operator, ObjectStorage | AWS SDK integration |
| KRaft Controller | metadata | org.apache.kafka.controller | QuorumController, ReplicationControlManager | Metadata management via Raft |
| Raft Protocol | raft | org.apache.kafka.raft | KafkaRaftClient, RaftClient | Raft consensus implementation |
| Coordinators | group-coordinator, transaction-coordinator | org.apache.kafka.coordinator.* | GroupCoordinator, TransactionCoordinator | Consumer group and transaction coordination |
| Streams | streams | org.apache.kafka.streams | KafkaStreams, StreamThread, TaskManager | Stream processing framework |
| Connect | connect-runtime | org.apache.kafka.connect.runtime | Worker, DistributedHerder | Connector framework |
The s3stream module is a Maven-based subproject (uses pom.xml) while most other modules use Gradle. This module has minimal dependencies and can be used independently:
software.amazon.awssdk:s3:2.29.26 - S3 client libraryio.netty:netty-handler:4.1.111 - Network I/O for S3 operationsorg.slf4j:slf4j-api:1.7.36 - Logging abstractionThe core module depends on s3stream for S3 storage functionality: build.gradle974
Sources: build.gradle841-1039 s3stream/pom.xml1-250 checkstyle/import-control.xml21-567
AutoMQ inherits Apache Kafka's configuration system. Brokers are configured primarily through server.properties:
| Configuration Class | Location | Purpose |
|---|---|---|
KafkaConfig | core/src/main/scala/kafka/server/KafkaConfig.scala | Main broker configuration with 200+ properties |
ServerConfigs | org.apache.kafka.server.config.ServerConfigs | Java-based server configuration constants |
SocketServerConfigs | org.apache.kafka.network.SocketServerConfigs | Network layer configuration |
ServerLogConfigs | org.apache.kafka.server.config.ServerLogConfigs | Log storage configuration |
AutoMQ adds configuration properties for S3 storage, defined in:
AutoMQConfig class in kafka.automq.AutoMQConfig - core/src/main/scala/kafka/server/KafkaConfig.scala98-102s3.endpoint - S3 endpoint URLs3.region - AWS regions3.bucket - S3 bucket names3.wal.path - Local path for write-ahead logs3.wal.cache.size - WAL cache size in byteselastic.stream.enabled - Enable/disable S3 storage modeSources: core/src/main/scala/kafka/server/KafkaConfig.scala1-1100 core/src/main/scala/kafka/server/DynamicBrokerConfig.scala1-700
To quickly evaluate AutoMQ on your local machine:
The simplest way to try AutoMQ is using the provided Docker Compose configuration:
To build AutoMQ from source:
| Script | Purpose | Example |
|---|---|---|
kafka-server-start.sh | Start broker | bin/kafka-server-start.sh config/server.properties |
kafka-topics.sh | Manage topics | bin/kafka-topics.sh --create --topic test |
kafka-console-producer.sh | Produce messages | bin/kafka-console-producer.sh --topic test |
kafka-console-consumer.sh | Consume messages | bin/kafka-console-consumer.sh --topic test |
All scripts use the kafka-run-class.sh wrapper which sets up the classpath: bin/kafka-run-class.sh1-382
Sources: README.md71-97 bin/kafka-run-class.sh1-382 build.gradle205-217
This overview provides the foundation for understanding the AutoMQ codebase. For deeper exploration:
Sources: README.md1-161 build.gradle1-1698
Refresh this wiki