From the course: Programming Generative AI: From Variational Autoencoders to Stable Diffusion with PyTorch and Hugging Face
Programming generative AI: Introduction
From the course: Programming Generative AI: From Variational Autoencoders to Stable Diffusion with PyTorch and Hugging Face
Programming generative AI: Introduction
- Hi, I'm Jonathan Dinu. Welcome to Programming Generative AI. Currently, I am a content creator and artist working with deep learning and generative AI. Previously, I was a PhD student at Carnegie Mellon University before dropping out to ultimately pursue the less academic side and more creative side of generative AI. I've always loved creating educational content, going all the way back to my days as a co-founder of Zipfian Academy, an immersive data science bootcamp where I had the opportunity to run workshops at major conferences like Strata and PyData, create video courses, and teach in person. I am excited I can still keep making educational content, both through my YouTube channel as well as for companies like Pearson. In this course, you'll learn enough about deep generative modeling to ultimately build your own multimodal systems with PyTorch and the Hugging Face ecosystem that are both capable of understanding text and images. And along the way, you'll learn how to manipulate data and embeddings in learned latent spaces, use pre-trained large language models for zero-shot learning, see how to personalize large pre-trained multimodal models like Stable Diffusion to build your own text-to-image model that is capable of generating images with a unique visual style. Broadly, this course is for anyone comfortable with Python and familiar with machine learning. That is engineers, data scientists, researchers or students, anyone really who's interested in getting started with generative AI. Specifically, this course is tailored to folks who don't want to just use generative AI and generative AI applications, but to build them. Also, a note of clarification. While this course will teach you the concepts and techniques that companies like Midjourney or OpenAI use, it will not teach you how to more effectively use those specific products themselves, nor will it teach you how to build a startup that uses something like ChatGPT. And in the same vein, while we will cover the concepts behind large language models for code, this is unfortunately not a course on how to more effectively use AI developer tools like GitHub Copilot. This is programming generative AI, not generative AI for programmers. We start with Lesson 1, where we cover generative AI and deep generative modeling conceptually and broadly. In Lesson 2, we dive into the specifics of the PyTorch framework, both learning about the features that make it particularly well-suited to deep learning research and applications, as well as learn about the basics of neural networks and see how to build them. Lesson 3 is all about working with images and building generative models of visual data, covering both specific architectures like convolutional neural networks and variational autoencoders, as well as the broader concepts around latent variable models. Lesson 4 is a deep dive into the specifics of diffusion models, the type of generative model that underlies most of the current state-of-the-art of text-to-image systems. And though we won't cover text-to image-generation in this lesson, we will see how unconditional diffusion models can still act as very powerful image generators and even perform image-to-image translation. Lesson 5 is to text as Lesson 4 was to images, where we'll look at the specific nuances of modeling text as well as get an introduction to the transformer architecture, which is what powers, usually, all of the impressive generative text systems like ChatGPT that you might see out there, but, importantly, is also used as a critical component in multimodal systems. Lesson 6 is our first foray into multimodal generative models, where we'll see how to connect text and images to build the type of text-to-image system that DALL-E popularized and that you hopefully know and love. And finally, in Lesson 7, we explore methods and techniques to fine-tune large pre-trained models in a resource-efficient manner. We will teach Stable Diffusion, how to generate new concepts, styles, and subjects not present in its training data to build and personalize text-to-image models. Thank you for your interest in the course. I know there is a glut of learning resources out there, and I appreciate you choosing to spend the time with this course, and hopefully you won't be disappointed. Now, without further ado, let's get started.
Download courses and learn on the go
Watch courses on your mobile device without an internet connection. Download courses using your iOS or Android LinkedIn Learning app.
Contents
-
-
(Locked)
Topics52s
-
(Locked)
What is PyTorch?4m 34s
-
(Locked)
The PyTorch layer cake11m 26s
-
(Locked)
The deep learning software trilemma7m 4s
-
(Locked)
What are tensors, really?5m 24s
-
(Locked)
Tensors in PyTorch10m 3s
-
(Locked)
Introduction to computational graphs12m 45s
-
(Locked)
Backpropagation is just the chain rule16m 31s
-
(Locked)
Effortless backpropagation with torch.autograd13m 39s
-
(Locked)
PyTorch's device abstraction (i.e., GPUs)4m
-
(Locked)
Working with devices10m 47s
-
(Locked)
Components of a learning algorithm7m 9s
-
(Locked)
Introduction to gradient descent6m 4s
-
(Locked)
Getting to stochastic gradient descent (SGD)4m 8s
-
(Locked)
Comparing gradient descent and SGD5m 50s
-
(Locked)
Linear regression with PyTorch23m 56s
-
(Locked)
Perceptrons and neurons7m 52s
-
(Locked)
Layers and activations with torch.nn12m 41s
-
(Locked)
Multi-layer feedforward neural networks (MLP)8m 58s
-
(Locked)
-
-
(Locked)
Topics54s
-
(Locked)
Representing images as tensors7m 45s
-
(Locked)
Desiderata for computer vision4m 57s
-
(Locked)
Features of convolutional neural networks7m 56s
-
(Locked)
Working with images in Python10m 20s
-
(Locked)
The Fashion-MNIST dataset4m 48s
-
(Locked)
Convolutional neural networks in PyTorch10m 43s
-
(Locked)
Components of a latent variable model (LVM)8m 57s
-
(Locked)
The humble autoencoder5m 29s
-
(Locked)
Defining an autoencoder with PyTorch5m 42s
-
(Locked)
Setting up a training loop9m 47s
-
(Locked)
Inference with an autoencoder4m 16s
-
(Locked)
Look ma, no features!8m 21s
-
(Locked)
Adding probability to autoencoders (VAE)4m 49s
-
(Locked)
Variational inference: Not just for autoencoders7m 20s
-
(Locked)
Transforming an autoencoder into a VAE13m 26s
-
(Locked)
Training a VAE with PyTorch13m 33s
-
(Locked)
Exploring latent space11m 37s
-
(Locked)
Latent space interpolation and attribute vectors12m 30s
-
(Locked)
-
-
(Locked)
Topics58s
-
(Locked)
Generation as a reversible process4m 55s
-
(Locked)
Sampling as iterative denoising4m 9s
-
(Locked)
Diffusers and the Hugging Face ecosystem6m 51s
-
(Locked)
Generating images with diffusers pipelines28m 20s
-
(Locked)
Deconstructing the diffusion process19m 9s
-
(Locked)
Forward process as encoder16m 47s
-
(Locked)
Reverse process as decoder7m 18s
-
(Locked)
Interpolating diffusion models9m 26s
-
(Locked)
Image-to-image translation with SDEdit8m 4s
-
(Locked)
Image restoration and enhancement11m 23s
-
(Locked)
-
-
(Locked)
Topics50s
-
(Locked)
The natural language processing pipeline13m
-
(Locked)
Generative models of language9m 31s
-
(Locked)
Generating text with transformers pipelines15m 5s
-
(Locked)
Deconstructing transformers pipelines8m 15s
-
(Locked)
Decoding strategies13m 6s
-
(Locked)
Transformers are just latent variable models for sequences12m 16s
-
(Locked)
Visualizing and understanding attention24m 21s
-
(Locked)
Turning words into vectors10m 44s
-
(Locked)
The vector space model7m 14s
-
(Locked)
Embedding sequences with transformers10m 11s
-
(Locked)
Computing the similarity between embeddings7m 48s
-
(Locked)
Semantic search with embeddings6m 32s
-
(Locked)
Contrastive embeddings with sentence transformers6m 46s
-
(Locked)
-
-
(Locked)
Topics51s
-
(Locked)
Components of a multimodal model5m 24s
-
(Locked)
Vision-language understanding9m 33s
-
(Locked)
Contrastive language-image pretraining6m 8s
-
(Locked)
Embedding text and images with CLIP14m 7s
-
(Locked)
Zero-shot image classification with CLIP3m 36s
-
(Locked)
Semantic image search with CLIP10m 40s
-
(Locked)
Conditional generative models5m 26s
-
(Locked)
Introduction to latent diffusion models8m 42s
-
(Locked)
The latent diffusion model architecture5m 50s
-
(Locked)
Failure modes and additional tools6m 40s
-
(Locked)
Stable diffusion deconstructed11m 30s
-
(Locked)
Writing your own stable diffusion pipeline11m 16s
-
(Locked)
Decoding images from the stable diffusion latent space4m 32s
-
(Locked)
Improving generation with guidance9m 12s
-
(Locked)
Playing with prompts30m 14s
-
(Locked)
-
-
(Locked)
Topics46s
-
(Locked)
Methods and metrics for evaluating generative AI7m 5s
-
(Locked)
Manual evaluation of stable diffusion with DrawBench13m 56s
-
(Locked)
Quantitative evaluation of diffusion models with human preference predictors20m 1s
-
(Locked)
Overview of methods for fine-tuning diffusion models9m 34s
-
(Locked)
Sourcing and preparing image datasets for fine-tuning7m 41s
-
(Locked)
Generating automatic captions with BLIP-28m 28s
-
(Locked)
Parameter efficient fine-tuning with LoRa11m 50s
-
(Locked)
Inspecting the results of fine-tuning5m 2s
-
(Locked)
Inference with LoRas for style-specific generation12m 22s
-
(Locked)
Conceptual overview of textual inversion8m 14s
-
(Locked)
Subject-specific personalization with DreamBooth7m 43s
-
(Locked)
DreamBooth versus LoRa fine-tuning6m 28s
-
(Locked)
DreamBooth fine-tuning with Hugging Face14m 11s
-
(Locked)
Inference with DreamBooth to create personalized AI avatars14m 21s
-
(Locked)
Adding conditional control to text-to-image diffusion models4m 7s
-
(Locked)
Creating edge and depth maps for conditioning15m 35s
-
(Locked)
Depth and edge-guided stable diffusion with ControlNet17m 10s
-
(Locked)
Understanding and experimenting with ControlNet parameters8m 32s
-
(Locked)
Generative text effects with font depth maps2m 49s
-
(Locked)
Few step generation with adversarial diffusion distillation (ADD)7m 2s
-
(Locked)
Reasons to distill6m 9s
-
(Locked)
Comparing SDXL and SDXL Turbo11m 49s
-
(Locked)
Text-guided image-to-image translation16m 52s
-
(Locked)
Video-driven frame-by-frame generation with SDXL Turbo13m
-
(Locked)
Near real-time inference with PyTorch performance optimizations11m 18s
-
(Locked)