From the course: LLaMa for Developers

Unlock this course with a free trial

Join today to access over 25,300 courses taught by industry experts.

Using a vendor for serving LLaMA

Using a vendor for serving LLaMA - Llama Tutorial

From the course: LLaMa for Developers

Using a vendor for serving LLaMA

- [Instructor] So far, we've worked with Llama both in a chat setting on Hugging Face and running it locally in a Colab. In this video, we're going to discuss how we can use a vendor for accessing Llama. So let's go ahead into a GitHub project. I'm here on the ray-project on the llmperf-leaderboard. What this repo does is benchmark how quickly Llama models are served by different vendors. As you can see here, eight different vendors are benchmarked and all of them offer a solution to run Llama. More recently, the Groq set of chips and APIs have been shown to be fastest in industry. So I'm going to go ahead and try those out. I'm going to open up console.groq.com and log in, and I'm going to hit Login with Google. Currently, Groq cloud is free, but it has a pretty low rate limit, so probably shouldn't be used in production. So let's go ahead and create an API key by clicking on API Keys on the left, I have one here already,…

Contents