From the course: Kafka Essentials: Quick Start for Building Effective Data Pipelines by Pearson
Course welcome - Kafka Tutorial
From the course: Kafka Essentials: Quick Start for Building Effective Data Pipelines by Pearson
Course welcome
Welcome to Kafka Essentials, a quick start for building effective data pipelines. In these lessons, I cover many essential and practical aspects of using and running the Apache Kafka event streaming platform. Apache Kafka is a popular message broker providing data flow management between producer application sources and consumer destinations. My name is Doug Edline, and I will be your instructor for these lessons. I am a practitioner and writer in the Linux cluster community and have documented many aspects of HPC, that's high-performance computing, and Hadoop slash Spark computing. Currently, I am the editor of hpcwire.com website, and I was previously the editor of ClusterWorld Magazine and a senior HPC editor for Linux Magazine. Some of my popular video tutorials and books include Data Engineering Foundation's Live Lessons Parts 1 and 2, Hadoop 2 Quick Start Guide, High-Performance Computing for Dummies, and Practical Data Science with Hadoop and Spark. To aid in the hands-on learning process, a fully configured single-node Linux Hadoop Spark virtual machine, along with all the lesson notes and examples is provided. The virtual machine includes many operational tools, including Apache Kafka, Hadoop, Spark, and others. Using these resources, you can repeat and study all the examples presented in these lessons. I should note that a background in using basic Linux command line and Python is useful. To get started, I introduced essential background concepts, Kafka components, basic examples, and the virtual machine in lesson one. A freely available Kafka user interface tool is introduced in lesson two, which provides a tool to view Kafka processing. Lesson three demonstrates Kafka functionality using examples of streaming weather data with Python producers and consumers. Examples of moving Kafka log data to external storage are presented in lessons four and using Python and PySpark. Lesson five presents an advanced example of Kafka image streaming for edge computing. Continuing with lesson six, examples of data pipelines using freely available Kafka Connect services are demonstrated. Installation of a basic three server Kafka broker cluster is demonstrated in lesson seven, followed by many essential administrative tasks in lesson eight. I also want to point out that there is a lessons resource page located at www.clustermonkey.net slash download slash live lessons slash Kafka underscore essentials. This page provides links to the lesson notes, the virtual machine, and other useful resources for your own exploration of the lesson examples. Let's get started with lesson one.