From the course: Introduction to Large Language Models

Open LLMs

From the course: Introduction to Large Language Models

Open LLMs

- [Narrator] We've looked at models that have all come from big tech firms like Google and OpenAI. But what about open community models? Now, although OpenAI made GPT-3 available via an API or the Playground, which is what we've seen so far, no access was given to the actual weight of the model. So if you had access to the weight of the model, you can tweak and experiment with the model to create new variations that might be better suited for a specific task. For example, you can try and create smaller versions of the model. Meta released OPT or Open Pre-trained Transformers. This is a couple of decode-only pre-trained transformers ranging from 125 million to 66 billion parameters which they shared with anyone. And interested researchers could also apply for access to the 175 billion parameter models. This effectively gave researchers access to a large language model that was very similar to GPT-3. The Meta team also detailed the infrastructure challenges they faced, along with providing code for experimenting with the models. This model was primarily trained on English text. A couple of months later, Hugging Face, which is a company that provides a machine learning platform to store models, think of it like GitHub for models, well, their research team received a grant for computing resources from the French Government allowing them to train the BLOOM model. Working together with a volunteer team of over 1,000 researchers from different countries and institutions, they created a 176 billion parameter model called BLOOM. This team has made everything openly available from the dataset used for training to anyone being able to download and run this model. This allows other organizations outside of big tech to experiment with these models. Now, even if you want to run these models, you still need access to expensive hardware accelerators. Now, despite all this collaboration, the performance of these models was still lagging behind the likes of the OpenAI models. This is probably because given this is a new field, not many engineers have this skills and experience working with hundreds of hardware accelerators in parallel and the associated problems. The majority of the engineers working for big tech companies like Google and OpenAI and Meta have had a couple of years headstart on others. In February of 2023, Meta released the LLaMA models which are 7, 13, 30, and 65 billion with a license only allowing them to be used for research and non-commercial purposes. Given the excellent performance of these models on various open benchmarks, they quickly became the most popular open language models thus far. The weights for these models were available to the research community, but they were quickly leaked. Some researchers extended the LLaMA model capability by instruction tuning. Others tried training it on non-English data. This visual gives you an idea of how impactful LLaMA has been to the open community. Up to this point, we knew from the OpenAI research that training models with instructions significantly improves their performance, but OpenAI didn't release the dataset allowing others to try this out. Because of the excellent model performance of LLaMA, it's been used to provide the language component for many multimodal models. These are models that can take in different inputs, for example, text or images and output either text or images. Over on the left, the LLaMA models were trained with Chinese data, providing a variety of different Chinese models trained on data for different industries like finance and law and education and medicine. I think one of the reasons that LLaMA was more widely adopted than say Meta's Open Pre-trained model is because it's much smaller and so more readily accessible to anyone 'cause you can easily run the smaller models on a single hardware accelerator like a GPU. This is the open large language model leaderboard available from the Hugging Face website. Let me just scroll down so that you can get a better look at some of the models. So this compares open large language models. In July of 2023, Meta released LLaMA 2. So this had a context length of 4,000 tokens and was trained on a whopping two trillion tokens. Now what was remarkable was that less than a week after Meta released this model, teams in the open community produced better performing models. So for example, you can see at the top of the leaderboard the Stable Beluga 2, which is a model from Stability AI. At the time of this recording, derivatives of LLaMA 2 are the best open models out there. It's also worth noting that LLaMA 2 isn't fully open. Meta didn't release the training data. Meta also requires companies with over 700 million monthly users to ask Meta's permission before using this model. This affects a handful of fellow tech giants like Google. Now, why would Meta be willing to make all their models openly available? It would cost millions of US dollars in compute and data requirements to train such models. And one of the challenges with AI models is safety and how they're used. By making the models openly available, Meta can learn how the models are being used and this allows them to make their own products which would use some of these models more safe. Open models have got smaller and better over the years closing the gap with closed models such as ChatGPT. So let's go ahead and add LLaMA 2 to our table comparing large language models. What's unique is that all of the LLaMA 2 models were trained on two trillion tokens and it's the first set of open models that is comparable in performance to ChatGPT.

Contents