Knowledge distillation - ChatGPT Tutorial

From the course: Advanced Guide to ChatGPT, Embeddings, and Other Large Language Models (LLMs)

Start my 1-month free trial Buy for my team

Knowledge distillation

“

I ended by talking about knowledge distillation of large language models. In this session, I'd like to expand on that even further and offer you two methods of distillation: task-specific and task-agnostic distillation. The former, task-specific distillation, takes a smaller, more efficient model, You take that teacher model, which has been fine-tuned to do classification, embeddings, whatever, and then you take a Tabula Rasa, a clean slate, smaller version, let's say a DistilBERT, a smaller version of BERT, and say, okay, the student model is going to be fine-tuned while listening to the teacher model perform the tasks and trying to replicate how the teacher solves the task. So the student is learning both the task from scratch, whatever that might be, classification, but it's also mixing in the answers from the teacher. So the student is trying to both learn math while watching a teacher do math in front of them. And then there's task-agnostic distillation where the student model is…

Unlock this course with a free trial

Join today to access over 25,200 courses taught by industry experts.

Knowledge distillation - ChatGPT Tutorial

From the course: Advanced Guide to ChatGPT, Embeddings, and Other Large Language Models (LLMs)

Knowledge distillation

Download courses and learn on the go

Contents

Start learning today.

Explore Business Topics

Explore Creative Topics

Explore Technology Topics