From the course: AI Workshop: Advanced Chatbot Development

Unlock this course with a free trial

Join today to access over 25,300 courses taught by industry experts.

Demo: Quantizing the chatbot model

Demo: Quantizing the chatbot model

- [Instructor] Okay. In this quick demo, I will show you how to do quantization in TensorFlow of an LLM into qint8. whether it's possible. On some operations and some layers, it will not, and then it will default to half precision. How do we do this? First, let's connect to a GPU. There we go. And we do the traditional pip installs. There we go. Now, that we have the pip installs, what we are going to do is first we're going to load flan-t5-base. You can do this with any LLM actually. And how do we quantize? Very simple. First, we need to create a converter from TFLiteConverter.from_keras_model. Then, we need to specify what is the optimization we want. In this case, the default one is qint8, remember. And then comes the magic. Because on the target, we're going to say, what are the supported operations? That means we are going to use the built-in ones, just in case the normal quantization fails as a fallback. And otherwise, use the selected operations, which is quantized to qint8. So…

Contents