Overview

Relevant source files

AutoMQ is a cloud-native Kafka distribution that replaces Apache Kafka's local disk storage with S3-compatible object storage. It is a fork of Apache Kafka that maintains 100% API compatibility while introducing a stateless broker architecture for elastic scaling and significant cost reduction.

What is AutoMQ?

AutoMQ transforms Apache Kafka from a shared-nothing architecture (where each broker stores partition data locally on disk) into a shared-storage architecture (where all brokers access partition data from S3). This fundamental change enables:

Stateless Brokers: Brokers can be added or removed in seconds without data movement
Zero Cross-AZ Traffic Costs: S3 serves as a shared storage layer, eliminating replication traffic
Infinite Scalable Storage: Leverage cloud object storage capacity without local disk constraints
10x Cost Reduction: Eliminate expensive EBS volumes and cross-AZ bandwidth charges

The codebase structure reflects this hybrid nature:

Standard Kafka modules (clients, core, streams, connect) remain largely unchanged
AutoMQ extensions primarily in the storage layer (s3stream module) and broker lifecycle management

Sources: README.md1-161 build.gradle841-1039

High-Level Architecture

AutoMQ's architecture consists of four main layers: client APIs, broker processing, storage, and control plane. The key innovation is in the storage layer where S3 replaces local disk storage.

System Architecture Diagram

Sources: build.gradle841-1039 core/src/main/scala/kafka/server/KafkaServer.scala1-900 s3stream/src/main/java/com/automq/stream/s3/S3Storage.java1-1400

AutoMQ vs Traditional Kafka: Storage Architecture

The core difference between AutoMQ and traditional Kafka is the storage layer architecture. Traditional Kafka uses a shared-nothing design with local disks, while AutoMQ uses a shared-storage design with S3.

Architecture Comparison Diagram

Key Differences Table

Aspect	Traditional Kafka	AutoMQ
Storage Architecture	Shared-nothing (local disk)	Shared-storage (S3)
Broker State	Stateful (stores partition data)	Stateless (no local partition data)
Data Durability	Multi-broker replication (RF=3)	S3 durability (99.999999999%)
Scaling Time	Hours (requires data movement)	Seconds (no data movement)
Cross-AZ Traffic	High (producer, consumer, replication)	Zero (S3 handles cross-AZ)
Storage Capacity	Limited by local disk size	Unlimited (S3 scales automatically)
Primary Storage Classes	`LocalLog`, `LogSegment` in `kafka.log`	`S3Storage`, `S3Stream` in `com.automq.stream.s3`
Replication Mechanism	`ReplicaFetcherThread` copies logs between brokers	S3 built-in redundancy
Storage Location	Each broker: `/var/kafka-logs/topic-partition/`	All brokers: `s3://bucket/objects/`

Write Path Comparison

Traditional Kafka Write Path:

Producer → SocketServer → KafkaApis → ReplicaManager
Leader broker writes to LocalLog → local disk
Leader replicates to follower brokers via ReplicaFetcherThread
Followers write to their local disks

AutoMQ Write Path:

Producer → SocketServer → KafkaApis → ReplicaManager
Leader broker writes to S3Storage → WriteAheadLog (local)
S3Storage asynchronously uploads to S3 via S3Operator
No follower replication needed (S3 provides durability)

Sources: README.md104-133 core/src/main/scala/kafka/server/KafkaServer.scala320-333 s3stream/src/main/java/com/automq/stream/s3/S3Storage.java94-190

S3 Storage Layer: AutoMQ's Core Innovation

The s3stream module (s3stream/ directory) contains AutoMQ's S3-based storage implementation. This is where the core differentiation from Apache Kafka occurs.

S3 Storage Components Diagram

Key S3 Storage Classes

Class	Module	Purpose
`S3Storage`	`s3stream`	Main storage engine coordinating writes/reads
`S3Stream`	`s3stream`	Per-stream abstraction with append/fetch APIs
`S3StreamClient`	`s3stream`	Client for creating and managing S3 streams
`WriteAheadLog`	`s3stream`	Local WAL for durability before S3 upload
`LogCache`	`s3stream`	In-memory cache for recent appends
`S3BlockCache`	`s3stream`	Read-through cache for S3 data blocks
`DefaultS3Operator`	`s3stream`	AWS SDK wrapper for S3 operations
`ObjectManager`	`s3stream`	Manages S3 object lifecycle and metadata

Sources: s3stream/src/main/java/com/automq/stream/s3/S3Storage.java94-190 s3stream/src/main/java/com/automq/stream/s3/S3Stream.java74-125 s3stream/src/main/java/com/automq/stream/s3/S3StreamClient.java1-100

Module and Component Map

AutoMQ's codebase is organized into Gradle modules. Here's how major components map to code locations:

Core Components by Module

Component	Module	Primary Package	Key Classes	Purpose
Client APIs	`clients`	`org.apache.kafka.clients.*`	`KafkaProducer`, `KafkaConsumer`, `KafkaAdminClient`	Standard Kafka client APIs
Broker Server	`core`	`kafka.server`	`KafkaServer`, `BrokerServer`	Broker lifecycle and coordination
Request Handling	`core`	`kafka.server`, `kafka.network`	`KafkaApis`, `SocketServer`, `RequestChannel`	Network I/O and request processing
Replication	`core`	`kafka.server`	`ReplicaManager`, `Partition`, `ReplicaFetcherManager`	Partition management and replication
Log Management	`storage`, `core`	`org.apache.kafka.storage.internals.log`, `kafka.log`	`LogManager`, `UnifiedLog`, `LocalLog`	Log lifecycle and segment management
S3 Storage	`s3stream`	`com.automq.stream.s3`	`S3Storage`, `S3Stream`, `S3StreamClient`	AutoMQ's S3-based storage engine
S3 Cache	`s3stream`	`com.automq.stream.s3.cache`	`LogCache`, `S3BlockCache`	Write buffer and read cache
S3 WAL	`s3stream`	`com.automq.stream.s3.wal`	`WriteAheadLog`, `RecoverResult`	Local write-ahead log
S3 Operations	`s3stream`	`com.automq.stream.s3.operator`	`DefaultS3Operator`, `ObjectStorage`	AWS SDK integration
KRaft Controller	`metadata`	`org.apache.kafka.controller`	`QuorumController`, `ReplicationControlManager`	Metadata management via Raft
Raft Protocol	`raft`	`org.apache.kafka.raft`	`KafkaRaftClient`, `RaftClient`	Raft consensus implementation
Coordinators	`group-coordinator`, `transaction-coordinator`	`org.apache.kafka.coordinator.*`	`GroupCoordinator`, `TransactionCoordinator`	Consumer group and transaction coordination
Streams	`streams`	`org.apache.kafka.streams`	`KafkaStreams`, `StreamThread`, `TaskManager`	Stream processing framework
Connect	`connect-runtime`	`org.apache.kafka.connect.runtime`	`Worker`, `DistributedHerder`	Connector framework

Module Dependencies

The s3stream module is a Maven-based subproject (uses pom.xml) while most other modules use Gradle. This module has minimal dependencies and can be used independently:

AWS SDK: software.amazon.awssdk:s3:2.29.26 - S3 client library
Netty: io.netty:netty-handler:4.1.111 - Network I/O for S3 operations
SLF4J: org.slf4j:slf4j-api:1.7.36 - Logging abstraction

The core module depends on s3stream for S3 storage functionality: build.gradle974

Sources: build.gradle841-1039 s3stream/pom.xml1-250 checkstyle/import-control.xml21-567

Key Configuration Files

AutoMQ inherits Apache Kafka's configuration system. Brokers are configured primarily through server.properties:

Broker Configuration Classes

Configuration Class	Location	Purpose
`KafkaConfig`	`core/src/main/scala/kafka/server/KafkaConfig.scala`	Main broker configuration with 200+ properties
`ServerConfigs`	`org.apache.kafka.server.config.ServerConfigs`	Java-based server configuration constants
`SocketServerConfigs`	`org.apache.kafka.network.SocketServerConfigs`	Network layer configuration
`ServerLogConfigs`	`org.apache.kafka.server.config.ServerLogConfigs`	Log storage configuration

AutoMQ-Specific Configuration

AutoMQ adds configuration properties for S3 storage, defined in:

AutoMQConfig class in kafka.automq.AutoMQConfig - core/src/main/scala/kafka/server/KafkaConfig.scala98-102
Configuration properties include:
- s3.endpoint - S3 endpoint URL
- s3.region - AWS region
- s3.bucket - S3 bucket name
- s3.wal.path - Local path for write-ahead log
- s3.wal.cache.size - WAL cache size in bytes
- elastic.stream.enabled - Enable/disable S3 storage mode

Sources: core/src/main/scala/kafka/server/KafkaConfig.scala1-1100 core/src/main/scala/kafka/server/DynamicBrokerConfig.scala1-700

Getting Started with AutoMQ

To quickly evaluate AutoMQ on your local machine:

Docker Compose Setup (Recommended for Testing)

The simplest way to try AutoMQ is using the provided Docker Compose configuration:

Building from Source

To build AutoMQ from source:

Key Runtime Commands

Script	Purpose	Example
`kafka-server-start.sh`	Start broker	`bin/kafka-server-start.sh config/server.properties`
`kafka-topics.sh`	Manage topics	`bin/kafka-topics.sh --create --topic test`
`kafka-console-producer.sh`	Produce messages	`bin/kafka-console-producer.sh --topic test`
`kafka-console-consumer.sh`	Consume messages	`bin/kafka-console-consumer.sh --topic test`

All scripts use the kafka-run-class.sh wrapper which sets up the classpath: bin/kafka-run-class.sh1-382

Sources: README.md71-97 bin/kafka-run-class.sh1-382 build.gradle205-217

Next Steps

This overview provides the foundation for understanding the AutoMQ codebase. For deeper exploration:

Architecture Details: See Architecture Overview for client-server interactions and control plane details
Storage Deep Dive: See S3 Storage System for AutoMQ's cloud-native storage implementation
Broker Internals: See Core Broker Components for request processing, replication, and partition management
Stream Processing: See Kafka Streams for the stateful stream processing framework
Operational Guide: See Operations and Maintenance for deployment and management procedures

Sources: README.md1-161 build.gradle1-1698