Miguel Jimeno, PhD’s Post

This is a series of data engineering projects I will be uploading to GitHub. This is a simple application that generates a series of events. The events are then generated as part of a topic in Kafka. Spark consumes from that topic, writes to Redis for initial fast aggregation if needed, and it also writes to a file for future consumption. There are multiple things to continue the project, including 1) taking it to the cloud, 2) fixing parquet writing, and 3) scaling it. The repository is here: https://lnkd.in/gKyHi34R #spark #kafka #dataengineering #python #redis

To view or add a comment, sign in

Explore content categories