From the course: Generative AI: Introduction to Large Language Models
What is a large language model?
From the course: Generative AI: Introduction to Large Language Models
What is a large language model?
- [Instructor] Large language models or LLMs are having a moment. You can barely get through any tech news article these days without reading something about large language models. A lot of the household names in tech, as well as some relative newcomers, have at least one large language model offering. They include OpenAI's GPT, which powers ChatGPT, Google's BERT, LaMDA and PaLM, which powers the Bard Chatbot, Llama by Meta, Claude by Anthropic, Cohere's Generate, NeMo LLM by Nvidia, and BLOOM by Hugging Face. So what exactly is a large language model? Large language models are generative AI models that are really good at understanding and generating human-like texts based on the input they receive. This input is called a prompt. You can think of a prompt as a short piece of text that is given to a large language model in order to control the output of the model in a variety of ways. A prompt can be a question, a sentence, a paragraph, or even a code snippet. The quality and relevance of the generated output often depends on the clarity and specificity of the prompt we provide. For example, if we provide an LLM with a prompt, "What is roti?" It could respond by writing, "Roti is a type of unleavened flatbread that is commonly consumed in South Asian countries, particularly in India, Pakistan, Bangladesh, Nepal, and Sri Lanka. It is a staple food in these regions and is enjoyed as a part of everyday meals." Nice. What about a prompt that reads, "Its reigning cats and." An LLM would correctly complete the sentence with the word "dogs." This may not seem too exciting, but consider this scenario where I provided an LLM with a prompt. "Once upon a time in a land, far, far away, there lived a princess." And it proceeded to write "Named Seraphina. She resided in the magnificent kingdom of Eldoria, a realm known for its lush, green meadows, towering castles and sparkling rivers. Seraphina possessed a heart full of compassion, a keen intellect, and a curious spirit." That is pretty cool. Large language models can also do math and process logic. For example, I pose the following question to an LLM. "Elaine has three bags with the same amount of marbles in them, totaling 12 marbles. Jerry has three bags with the same amount of marbles in them, totaling 18 marbles. How many more marbles does Jerry have in each bag?" Unsurprisingly, the LLM was able to correctly respond, "Jerry has two more marbles in each bag compared to Elaine." Impressive. These examples are just a few of the many things that large language models can do. For the most part, LLMs can answer questions, solve problems, provide coding and writing assistance, summarize text and translate languages. They can serve as an educational tool, explain concepts in different subjects, and help you understand complex topics. How is it that large language models are so powerful and versatile? One major reason is that most recent large language models are based on a very powerful type of artificial neural network invented by Google known as a transformer. We will get a bit more into neural networks and transformers in subsequent videos of this course. For now, know that what makes a transformer architecture so powerful is its ability to scale effectively, allowing us to train models using massive text datasets. Another reason why large language models are so powerful is that they are typically trained on massive amounts of text data collected from diverse sources such as books, articles, websites, and other textual resources. This extensive training data provides an LLM with exposure to a vast array of words, phrases, sentence structures, and topics, enabling it to capture the nuances of language usage. This is really important. Large language models come from both the size and complexity of the neural network architecture, as well as the size and diversity of the dataset that it was trained on. Both are massive. It's impossible to overestimate the importance of the diversity of texts that is used to train large language models. An LLM is able to quote scientific facts, because it has poured over billions of research papers. A large language model is able to write poetry largely because it has seen trillions of poems and music lyrics online. An LLM can help correct your code, because it has seen billions of code problems and their solutions on popular coding websites. Large language models have seen billions of examples of people summarizing, rewriting texts into bullet points, and describing how to make texts more grammatically concise or persuasive. As a result, they can do the same. In summary, whenever we ask a large language model to do something, know that there is a very good chance that you are asking it to do something that it has seen billions or even trillions of examples of. And even if you come up with something you think is unique, like, "What would happen if Iron Man ate too many shawarmas?" It is very likely that the model has been exposed to a decent amount of blogs that discuss fictional characters like Iron Man and has also been exposed to content that describes what happens when we eat too much of anything. As a result, you could get an answer that reads, "Iron Man, being a fictional character, cannot eat shawarmas or experience the consequences of overindulgence. However, if we were to imagine a humorous scenario, consuming too many shawarmas might lead to discomfort, bloating, and possibly a stomach ache for anyone, including Iron Man."