From the course: Foundations of AI and Machine Learning for Java Developers
Large language models and NLP - Java Tutorial
From the course: Foundations of AI and Machine Learning for Java Developers
Large language models and NLP
- [Instructor] So we've covered a definition of artificial intelligence and its relationship to machine learning and deep learning. We've also discussed the two important subsets of deep learning, predictive deep learning, predictive AI, and generative deep learning, gen AI. And we've also covered predictive AI using Java and JSR 3D One. Now, let's focus specifically on generative AI. Since the concept of a model is particularly important with generative AI, let's review what a model is. In general, a model is a representation of some physical entity that behaves just like that physical entity, and the machine learning model is a model that's been created using machine learning techniques. That is, no explicit programming to recognize patterns in a dataset. After all, that's what machine learning is really, really good at, recognizing patterns in a dataset. And after an ML model is trained on a dataset and saved, that model can be used as a representation of that data. For example, you may want a model of the purchasing patterns of customers on an ecommerce website, such as Amazon, Netflix, Alibaba, Flipkart, or Jumia. You're looking for patterns on how people have used your site so you can make purchasing recommendations. Or you may have specific internal business documentation composed of text files, PDFs, websites, transcriptions of videos, scanned images, and you need to understand the patterns of how employees retrieve information from those documents. There might be patterns in a science fiction story that involve space exploration, such as deadly aliens, isolation, survival, or colonization. These seem to be very popular patterns, maybe too common. How about music lyrics from a certain musical genre? It might be blues, jazz, or country music. Many times song lyrics in a genre follow certain patterns, so a machine learning model is created that examines data from these different types of situations to understand these patterns. Now understanding patterns of this type of textual information will be quite powerful. The study of such textual patterns is called natural language processing. Natural language processing, or NLP for short, is a set of algorithms and techniques for processing and understanding human language. It's been around for quite a long time. A language model, sometimes called LM, is an NLP technique that analyzes text to determine patterns in that text. Then it uses the probability distribution of those patterns to generate new text. Traditionally, a language model is a classical machine learning technique. Now, a large language model, or LLM, is just another language model, except it uses very large graph data structures that store a very large number of parameters. These typically range in the billions of parameters in a data structure that's called a neural network. You may be surprised, but these are not new topics. One of the earliest works on linguistics was from Ferdinand de Saussure back in the early 1900s. He was a noted linguist and was fascinated with languages. His book on general linguistics is the seminal work in the field. Now, Alan Turing followed up on that work about 30 years later. Turing, certainly a very famous computer scientist, has a paper on NLP that covers many of the advances of NLP at the time. Then 50 years after Turing's work, we had the first usage of the term language model. This was back in 1991, so it's really not that old. Then about 25 years later, following the advances made with cloud computing and the accumulation of very large data sets, we have a huge breakthrough with LLMs and machine learning, the transformer architecture. The transformer architecture offers several algorithmic innovations such as its innovative self attention mechanisms. This breakthrough helps to determine textual context and word relationships in LLMs. You could find the details of this architecture in the transformer technical paper. To me, the biggest advantage of transformer models is that they are highly scalable and parallelizable, unlike previous sequential or serial techniques. This makes training time significantly faster, and transformers can easily leverage the benefits of modern cloud computing and large datasets. Nowadays, most work related to NLP and gen AI is based on transformer LLMs. This paper from Google Research came out in 2017, so it really wasn't that long ago. Since then, growth in gen AI and ML in general has exploded. Practically all gen AI systems are based on transformers, Open AI's chatGPT, Google's Gemini, Meta's Llama, Anthropic's Claude, et cetera. These modern ML systems all use a transformer architecture as a foundation of their gen AI services. Now that we have transformer LLMs, how do we use this new type of powerful AI? Let's see.
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.