From the course: Hands-On AI: Build a RAG Model from Scratch with Open Source
Unlock this course with a free trial
Join today to access over 25,300 courses taught by industry experts.
Generating the corpus
From the course: Hands-On AI: Build a RAG Model from Scratch with Open Source
Generating the corpus
- [Instructor] Welcome back. We're now gonna work on generating a corpus of data as we need to source some content for our RAG model to use. And a corpus is simply a collection of data, all the data that our RAG model will leverage in order to generate a response to a user's query. Now, collecting our data could be done in a variety of ways, but we wanna make sure that everybody can follow, so we're gonna do it in one of the easiest programmatic ways possible, and that is by simply using the Wikipedia Python package to pull Wikipedia articles. So the first step is for us to install Wikipedia. Let's make sure that we are in our environment. So we're going to source the environment, which is called rag_env, and then we're gonna go ahead and install Wikipedia. Great. Now let's start writing some code so that we can use this package. We're gonna create a new file that we're gonna call generate_corpus, and it's gonna be a…
Contents
-
-
-
-
Setting up a dev container7m 56s
-
(Locked)
Setting up environment and installing Ollama5m 40s
-
(Locked)
Creating a model file8m 33s
-
(Locked)
Running Ollama programmatically through Python7m 43s
-
(Locked)
Generating the corpus10m 17s
-
(Locked)
Extract text from different local file formats with Docling4m 43s
-
-
-
-