From the course: Programming Generative AI: From Variational Autoencoders to Stable Diffusion with PyTorch and Hugging Face

Programming generative AI: Introduction

- Hi, I'm Jonathan Dinu. Welcome to Programming Generative AI. Currently, I am a content creator and artist working with deep learning and generative AI. Previously, I was a PhD student at Carnegie Mellon University before dropping out to ultimately pursue the less academic side and more creative side of generative AI. I've always loved creating educational content, going all the way back to my days as a co-founder of Zipfian Academy, an immersive data science bootcamp where I had the opportunity to run workshops at major conferences like Strata and PyData, create video courses, and teach in person. I am excited I can still keep making educational content, both through my YouTube channel as well as for companies like Pearson. In this course, you'll learn enough about deep generative modeling to ultimately build your own multimodal systems with PyTorch and the Hugging Face ecosystem that are both capable of understanding text and images. And along the way, you'll learn how to manipulate data and embeddings in learned latent spaces, use pre-trained large language models for zero-shot learning, see how to personalize large pre-trained multimodal models like Stable Diffusion to build your own text-to-image model that is capable of generating images with a unique visual style. Broadly, this course is for anyone comfortable with Python and familiar with machine learning. That is engineers, data scientists, researchers or students, anyone really who's interested in getting started with generative AI. Specifically, this course is tailored to folks who don't want to just use generative AI and generative AI applications, but to build them. Also, a note of clarification. While this course will teach you the concepts and techniques that companies like Midjourney or OpenAI use, it will not teach you how to more effectively use those specific products themselves, nor will it teach you how to build a startup that uses something like ChatGPT. And in the same vein, while we will cover the concepts behind large language models for code, this is unfortunately not a course on how to more effectively use AI developer tools like GitHub Copilot. This is programming generative AI, not generative AI for programmers. We start with Lesson 1, where we cover generative AI and deep generative modeling conceptually and broadly. In Lesson 2, we dive into the specifics of the PyTorch framework, both learning about the features that make it particularly well-suited to deep learning research and applications, as well as learn about the basics of neural networks and see how to build them. Lesson 3 is all about working with images and building generative models of visual data, covering both specific architectures like convolutional neural networks and variational autoencoders, as well as the broader concepts around latent variable models. Lesson 4 is a deep dive into the specifics of diffusion models, the type of generative model that underlies most of the current state-of-the-art of text-to-image systems. And though we won't cover text-to image-generation in this lesson, we will see how unconditional diffusion models can still act as very powerful image generators and even perform image-to-image translation. Lesson 5 is to text as Lesson 4 was to images, where we'll look at the specific nuances of modeling text as well as get an introduction to the transformer architecture, which is what powers, usually, all of the impressive generative text systems like ChatGPT that you might see out there, but, importantly, is also used as a critical component in multimodal systems. Lesson 6 is our first foray into multimodal generative models, where we'll see how to connect text and images to build the type of text-to-image system that DALL-E popularized and that you hopefully know and love. And finally, in Lesson 7, we explore methods and techniques to fine-tune large pre-trained models in a resource-efficient manner. We will teach Stable Diffusion, how to generate new concepts, styles, and subjects not present in its training data to build and personalize text-to-image models. Thank you for your interest in the course. I know there is a glut of learning resources out there, and I appreciate you choosing to spend the time with this course, and hopefully you won't be disappointed. Now, without further ado, let's get started.

Contents