From the course: AI Workshop: Advanced Chatbot Development
Unlock this course with a free trial
Join today to access over 25,300 courses taught by industry experts.
Demo: Quantizing the chatbot model
From the course: AI Workshop: Advanced Chatbot Development
Demo: Quantizing the chatbot model
- [Instructor] Okay. In this quick demo, I will show you how to do quantization in TensorFlow of an LLM into qint8. whether it's possible. On some operations and some layers, it will not, and then it will default to half precision. How do we do this? First, let's connect to a GPU. There we go. And we do the traditional pip installs. There we go. Now, that we have the pip installs, what we are going to do is first we're going to load flan-t5-base. You can do this with any LLM actually. And how do we quantize? Very simple. First, we need to create a converter from TFLiteConverter.from_keras_model. Then, we need to specify what is the optimization we want. In this case, the default one is qint8, remember. And then comes the magic. Because on the target, we're going to say, what are the supported operations? That means we are going to use the built-in ones, just in case the normal quantization fails as a fallback. And otherwise, use the selected operations, which is quantized to qint8. So…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.
Contents
-
-
-
-
-
(Locked)
Principles of model pruning5m 1s
-
(Locked)
Demo: Pruning the chatbot model8m 19s
-
(Locked)
Theory and practice of model distillation6m 58s
-
(Locked)
Demo: Applying model distillation to the chatbot8m 38s
-
(Locked)
Understanding and implementing quantization6m 34s
-
(Locked)
Demo: Quantizing the chatbot model5m 35s
-
(Locked)
Demo: Overview of the results10m 47s
-
(Locked)
Solution: Prepare the chatbot for deployment11m 12s
-
(Locked)
-
-
-
-