Sitemap
Decoding AI Magazine

Battle-tested content on designing, coding, and deploying production-grade ML & MLOps systems. The hub for continuous learning on ML system design, ML engineering, MLOps, large language models (LLMs), and computer vision (CV).

The Hands-On LLMs Series

Why you must choose streaming over batch pipelines when doing RAG in LLM applications

Lesson 2: RAG, streaming pipelines, vector DBs, text processing

12 min readJan 9, 2024

--

Press enter or click to view image in full size
Image by DALL-E

→ the 2nd out of 8 lessons of the Hands-On LLMs free course

By finishing the Hands-On LLMs free course, you will learn how to use the 3-pipeline architecture & LLMOps good practices to design, build, and deploy a real-time financial advisor powered by LLMs & vector DBs.

We will primarily focus on the engineering & MLOps aspects. Thus, by the end of this series, you will know how to build & deploy a real ML system, not some isolated code in Notebooks (we haven’t used any Notebooks at all).

More precisely, these are the 3 components you will learn to build:

  1. a real-time streaming pipeline (deployed on AWS) that listens to financial news, cleans & embeds the documents, and loads them to a vector DB
  2. a fine-tuning pipeline (deployed as a serverless continuous training) that fine-tunes an LLM on financial data using QLoRA, monitors the experiments using an experiment tracker and saves the best model to a model registry
  3. an inference pipeline built in LangChain (deployed as a serverless…

--

--

Decoding AI Magazine
Decoding AI Magazine

Published in Decoding AI Magazine

Battle-tested content on designing, coding, and deploying production-grade ML & MLOps systems. The hub for continuous learning on ML system design, ML engineering, MLOps, large language models (LLMs), and computer vision (CV).

Paul Iusztin
Paul Iusztin

Written by Paul Iusztin

Senior AI Engineer • Founder @ the Decoding AI Magazine

Responses (1)