From the course: Hands-On AI: Build a RAG Model from Scratch with Open Source

Unlock this course with a free trial

Join today to access over 25,300 courses taught by industry experts.

Generating the corpus

Generating the corpus

- [Instructor] Welcome back. We're now gonna work on generating a corpus of data as we need to source some content for our RAG model to use. And a corpus is simply a collection of data, all the data that our RAG model will leverage in order to generate a response to a user's query. Now, collecting our data could be done in a variety of ways, but we wanna make sure that everybody can follow, so we're gonna do it in one of the easiest programmatic ways possible, and that is by simply using the Wikipedia Python package to pull Wikipedia articles. So the first step is for us to install Wikipedia. Let's make sure that we are in our environment. So we're going to source the environment, which is called rag_env, and then we're gonna go ahead and install Wikipedia. Great. Now let's start writing some code so that we can use this package. We're gonna create a new file that we're gonna call generate_corpus, and it's gonna be a…

Contents