From the course: Cloud-Based AI Solution Design Patterns

Serverless data pipeline

- Before we describe this pattern, let's first briefly establish what a serverless environment is. A serverless environment is a managed service offered by a cloud provider to supply a platform that allows you to run code and execute functions without having to worry about setting up or managing any servers. A serverless environment in the cloud is an alternative to the infrastructure as a service offering that cloud providers also offer, which does require you to manage your own servers in the cloud. With the serverless data pipeline pattern, we take advantage of a serverless environment in that we use it to assemble simple, highly-efficient data pipeline workflows by breaking down the pipeline steps into a set of small, independent functions. Each function is triggered by an event, like the arrival of new data or the completion of a data processing step. This is referred to as function chaining, where the output of one function triggers the next function, or perhaps the nature of the output determines that one of several available functions is triggered. So for example, a data ingestion function could be executed that then triggers a data transformation function, which leads to perhaps a data validation function. The functions in a serverless data pipeline can extend to AI model training and deployment. Because of its lightweight nature, a serverless data pipeline is highly scalable. As it encounters fluctuations in data volume, resources are automatically added or removed so that its overall performance remains consistent. You might recall the data pipeline orchestration pattern we covered in the previous course. This pattern has similarities, but it's not exactly an alternative. We'd still want to use a data pipeline orchestration platform when we want to predefine and carry out more complex workflow logic. In fact, the simpler workflows associated with a serverless data pipeline could be encapsulated as tasks within a greater workflow carried out by a data pipeline orchestration platform. Comparing a serverless data pipeline to a data pipeline orchestration also requires us to make a distinction between implicit and explicit workflows. When we define workflow logic with a data pipeline orchestration platform, we end up creating explicit workflows because they are structured and predefined. The event-driven nature of a serverless data pipeline leads us to implicit workflows, whereby we create simpler, less structured workflow logic that relies on runtime events to be executed. Serverless data pipelines are very popular with AI solutions because they allow us to quickly create simple and effective workflows that are adaptable and highly scalable. And if we need more complex workflow logic, we have the option of incorporating implicit serverless data pipeline workflows within a greater explicit workflow orchestration.

Contents