From the course: Advanced Guide to ChatGPT, Embeddings, and Other Large Language Models (LLMs)

Unlock this course with a free trial

Join today to access over 25,200 courses taught by industry experts.

Using OpenAI’s embedding models

Using OpenAI’s embedding models

because I like to put all the imports for the entire notebook at the top of the notebook. So, you know, and it's a pretty long notebook, so I'll explain them as we go. The first thing we're going to do, So, given that we're going to clean the text just by removing any non printable characters, and then we're also going to concatenate different fields of each anime into a single text blob. Because recall, LLMs simply need to take in a large text blob as their input. We saw earlier that all LLMs will have some special tokens to know when a human is speaking, the bot is speaking, whatever. But each of these animes needs to be represented rather as a single piece of text. So here I am giving it a piece of text with some, not all of the fields in the dataset. Now, I'll say this once in a long way, and then I'll say it shorter later on. This is the first time that you're going to see a sunon my own decision being made on how to use this dataset. If you're an anime expert or a recommendation…

Contents