From the course: Advanced Guide to ChatGPT, Embeddings, and Other Large Language Models (LLMs)

Unlock this course with a free trial

Join today to access over 25,200 courses taught by industry experts.

Knowledge distillation

Knowledge distillation

I ended by talking about knowledge distillation of large language models. In this session, I'd like to expand on that even further and offer you two methods of distillation: task-specific and task-agnostic distillation. The former, task-specific distillation, takes a smaller, more efficient model, You take that teacher model, which has been fine-tuned to do classification, embeddings, whatever, and then you take a Tabula Rasa, a clean slate, smaller version, let's say a DistilBERT, a smaller version of BERT, and say, okay, the student model is going to be fine-tuned while listening to the teacher model perform the tasks and trying to replicate how the teacher solves the task. So the student is learning both the task from scratch, whatever that might be, classification, but it's also mixing in the answers from the teacher. So the student is trying to both learn math while watching a teacher do math in front of them. And then there's task-agnostic distillation where the student model is…

Contents