Demo: Quantizing the chatbot model

From the course: AI Workshop: Advanced Chatbot Development

Start my 1-month free trial Buy for my team

Demo: Quantizing the chatbot model

“

- [Instructor] Okay. In this quick demo, I will show you how to do quantization in TensorFlow of an LLM into qint8. whether it's possible. On some operations and some layers, it will not, and then it will default to half precision. How do we do this? First, let's connect to a GPU. There we go. And we do the traditional pip installs. There we go. Now, that we have the pip installs, what we are going to do is first we're going to load flan-t5-base. You can do this with any LLM actually. And how do we quantize? Very simple. First, we need to create a converter from TFLiteConverter.from_keras_model. Then, we need to specify what is the optimization we want. In this case, the default one is qint8, remember. And then comes the magic. Because on the target, we're going to say, what are the supported operations? That means we are going to use the built-in ones, just in case the normal quantization fails as a fallback. And otherwise, use the selected operations, which is quantized to qint8. So…

- (Locked)
  
  Recap of key learnings and tips
  
  2m 22s
- (Locked)
  
  Continuing on with AI chatbots
  
  1m 4s

Unlock this course with a free trial

Join today to access over 25,300 courses taught by industry experts.

Demo: Quantizing the chatbot model

From the course: AI Workshop: Advanced Chatbot Development

Demo: Quantizing the chatbot model

Practice while you learn with exercise files

Download courses and learn on the go

Contents

Start learning today.

Explore Business Topics

Explore Creative Topics

Explore Technology Topics