A Sandbox for Kafka Connect Connectors Experimentation

7 min readJun 19, 2024

This article introduces a sandbox for Kafka Connect, an environment designed to simplify the process of testing different connectors. It enables developers to streamline their experimentation process, reducing the time spent on it.

Introduction

Kafka Connect is a tool within the Apache Kafka ecosystem that simplifies the process of integrating Kafka with other systems. It provides a scalable and reliable way to stream data between Kafka and several databases, cloud services, and other endpoints without requiring custom code (sometimes very little).

The versatility of Kafka Connect lies in its wide range of connectors — pre-built integrations that allow Kafka to interact with external systems. Whether you need to ingest data from a relational database, synchronise with a NoSQL store, or just export data to a data warehouse, Kafka Connect likely has a connector for the task.

However, setting up and testing these connectors can be complex and time-consuming, often requiring a dedicated environment for validation and troubleshooting.

What is this Sandbox?

This sandbox provides a lab environment for developers to simulate real-world scenarios and assess the performance, functionality, and reliability of Kafka Connect connectors before deploying them in a staging or production environment. Developers can detect and address problems early, validate new features, and ensure that their solutions work as intended in various scenarios. This leads to higher quality, and therefore more reliable systems.

Out of the box, it does the following:

Spin up Apache Kafka.
Build and spin up Kafka Connect.
Spin up Kafkdrop — a UI for Apache Kafka.
Spin up Confluent Kafka REST Proxy.
Spin up a Prometheus server.
Health check on Apache Kafka.
Health check on Kafka Connect.
Read data from a file and ingest data to Kafka (File source test).
Read data from Kafka and output it to a local file (File sink test).
Read data from Kafka and output it into a database (PostgreSQL sink test).
Check the status of the source and sink connectors using Kafka Connect’s API.

Architecture

Now, let’s explore the architecture of this sandbox in a C4 container diagram (Figure 1). This will not only help us better understand what it does but also guide us when needing to add a new Kafka Connect connector to it.

We can see that the sandbox provides capabilities to ingest, process, and store data while providing monitoring and visualisation capabilities. The data flows this way in this architecture: (i) the File Source Connector reads data from file-source.txt and ingests it into the Kafka topic local.connect-test-file; (ii) the File Sink Connector reads from the Kafka topic and writes the data to test.sink.txt; (iii) the JDBC Sink Connector reads from the Kafka topic and ingests the data into the demo_sink PostgreSQL database.

Breaking down the main components and their interactions, we have:

File Source Connector: it reads a text file (file-source.txt) from the disk and ingests the data into a Kafka topic.
Kafka Broker: message broker that holds the topic local.connect-test-file, which stores the ingested data from the File Source Connector.
File Sink Connector: consumes data from the Kafka topic and writes it to a file (test.sink.txt) on disk.
JDBC Sink Connector: consumes data from the Kafka topic and ingests it into a PostgreSQL database (demo_sink).
demo_sink database: a PostgreSQL database (demo_sink) where data will be ingested into.

The accessory items are:

Prometheus JMX Exporter: exports JMX metrics from Kafka Connect (including all connectors) to an endpoint, providing monitoring data.
Prometheus Server: scrapes metrics from the JMX Exporter and provides insights into the Kafka Connect environment.
Confluent REST Proxy: provides an API to access Kafka directly, allowing the user to make API calls to obtain information about the Kafka cluster/broker.
Kafdrop UI: provides a user interface to visualize information about Kafka topics.

Running the environment

Now that we grasp the sandbox’s architecture, let’s make the environment work. First, you need to clone the following repository:

git clone git@github.com:estevaosaleme/kafka-connect-sandbox.git

If you want to just download it as a zip file, please go to https://github.com/estevaosaleme/kafka-connect-sandbox.

Within the sandbox folder, there will be a docker-compose.yml file with the following services:

services:
  kafka-kraft: ... # Kafka Broker
  kafka-init: ... # External healthcheck for the broker
  postgres: ... # PostgreSQL database to test the jdbc sink connector
  kafka-rest-proxy: ... # An API to access the broker directly
  kafdrop: ... # UI for the broker
  prometheus: ... # Metrics server
  kafka-connect: ... # Kafka connect itself
  tests: ... # Tests for the connectors

All you need to be able to build and run the sandbox is Docker and Docker Compose (depending on the version and OS, this might be included with Docker).

To build the Kafka Connect image type in:

docker-compose build

Then, you should be able to run the environment:

docker-compose up

If everything goes well, you should see something like this (Figure 2):

Figure 2. Docker compose creating and running the sandbox.

To make sure the connectors ran successfully, tests will be executed automatically. Look for the following messages on the console to confirm that the connectors are up and running (Figures 3, 4, and 5):

Figure 3. File source connector running successfully.

Figure 4. File sink connector running successfully.

Figure 5. JDBC sink connector running successfully.

A step further to verify the output is to connect to the PostgreSQL database demo_sink and verify the table demo_table (user kafka_user, password 123456). The output should be the following:

id|description       |
--+------------------+
 1|description test  |
 2|description test 2|

For the file sink connector output, we need to execute the kafka-connect container (docker exec -it <container_name> bash) and look for the file /tmp/file-sink.txt.

Kafka Connectors & Configuration

So far, we have been able to get the sandbox up and running, which is great! What if we want to change it? Well, we need to understand its intricacies. That’s what this section is all about.

First, let’s have a look at the sandbox folder structure:

├── build-artifacts                             
│   ├── connect-jmx-exporter.yml
│   ├── connect-log4j.properties
│   ├── connect-standalone.properties
│   └── docker-entrypoint.sh
├── docker-compose.yml
├── Dockerfile
├── README.md
├── vol-kafka-connect
│   ├── connectors
│   │   ├── connect-file-3.7.0.jar
│   │   └── file-source.txt
│   └── custom-config
│       ├── custom-connect-jmx-exporter.yaml
│       └── custom-connect-standalone.properties
├── vol-prometheus
│   └── custom-prometheus.yml
└── vol-tests
    ├── file-sink-connector.json
    ├── file-source-connector.json
    └── run-tests.sh

Folders starting with vol- followed by the container name refer to the volumes that will be mounted. Any change made to these files will reflect on the environment. The build-artifacts folder contains the files needed to build the Kafka Connect image and is used within the Dockerfile.

To add a new connector, we should copy the downloaded connector to the folder vol-kafka-connect/connectors. It’s as simple as that!

To run the new connector, add the connector’s configuration file in a json format to the folder vol-tests and change the file run-tests.sh to add the following line at its end:

update_connector_config "<name of the connector>" "/tests/<configuration json file>"

update_connector_config is a function that makes a POST call to the Kafka Connect API informing the name of the connector and its configuration path.

Visualising Records and Troubleshooting Issues

Kafdrop UI (Figure 6), is an open-source web-based user interface that provides an easy way to visualise and manage Apache Kafka topics and messages. It simplifies the interaction with Kafka clusters by offering a user-friendly interface that helps users monitor and troubleshoot their Kafka setup.

After running the sandbox, visit http://localhost:9020 to access it.

Figure 6. Kafdrop UI — messages available on the topic local.connect-test-file.

Prometheus and its UI (Figure 7) can provide valuable insights into the performance and health of the connectors and the overall system. This helps in identifying bottlenecks, optimising configurations, and ensuring that everything is operating as expected.

On the sandbox, the Prometheus UI is available at http://localhost:9090.

Figure 7. Prometheus UI — a graph showing the metric kafka_consumer_metrics_io_time_ns_avg.

The Confluent REST Proxy (Figure 8) is a component of the Confluent Platform that provides a RESTful interface to an Apache Kafka cluster. It allows us to produce and consume messages, and interact with Kafka using an API. This can be useful to handle data topics (for instance, insert or delete messages).

The API is available at http://localhost:8082 just after launching the sandbox.

Figure 8. Confluent REST Proxy API being consumed using Postman.

Final Considerations

Using a sandbox for Kafka Connect connectors offers advantages that can enhance our development without impacting other environments. It simplifies testing and debugging by providing a user-friendly environment for deploying, configuring, and validating connectors’ behaviours. The integration of monitoring and visualisation tools, such as Prometheus and Kafdrop, facilitates better troubleshooting.

If you want to improve it, please feel free to fork the project at https://github.com/estevaosaleme/kafka-connect-sandbox/fork and send PR requests.

DevOps.dev