Evaluating generative tasks: Part 1 - ChatGPT Tutorial

From the course: Advanced Guide to ChatGPT, Embeddings, and Other Large Language Models (LLMs)

Start my 1-month free trial Buy for my team

Evaluating generative tasks: Part 1

“

- While this isn't the first time we've talked about evaluations, this is going to be the first time we're going to take an in-depth look into the many ways that we can evaluate the many types of LLM tasks. But before we do, we should understand that evaluation is not simply checking whether or not a model works. That is part of it, but it's basically step one. True holistic evaluation of LLMs is really more of a step to understand how well the model will actually perform and will be useful in a real-world scenario. We're also going to be discussing how to measure things like trustworthiness and impactfulness of our tasks and of our datasets. Now, most people's testing harnesses will look something like this when it comes to LLMs. They'll have more or less a two by two, well, not two by two, but a two-axis grid, where you have models on one axis, like I have GPT-3.5, Llama 3, Claude 3.5 as columns, and as rows, I have prompting variants. Now, this is of course, a testing harness…

Unlock this course with a free trial

Join today to access over 25,200 courses taught by industry experts.

Evaluating generative tasks: Part 1 - ChatGPT Tutorial

From the course: Advanced Guide to ChatGPT, Embeddings, and Other Large Language Models (LLMs)

Evaluating generative tasks: Part 1

Download courses and learn on the go

Contents

Start learning today.

Explore Business Topics

Explore Creative Topics

Explore Technology Topics