From the course: Programming Generative AI: From Variational Autoencoders to Stable Diffusion with PyTorch and Hugging Face
Unlock this course with a free trial
Join today to access over 24,800 courses taught by industry experts.
Embedding text and images with CLIP
From the course: Programming Generative AI: From Variational Autoencoders to Stable Diffusion with PyTorch and Hugging Face
Embedding text and images with CLIP
- [Instructor] So in our lesson on encoding text, at the very end, if you remember, I presented the SentenceTransformers library, which, internally, most of the models are actually trained using a similar objective to what I just presented with CLIP, basically using contrastive loss. Where the model isn't predicting a label, it isn't predicting another word or an image, but it's really trying to predict the most similar in a group of something or trying to predict how similar two separate things are when presented together. But in that SentenceTransformers embedding model, it was really text-to-text, essentially. We were embedding sentences compared against sentences. CLIP on the other hand, as we just saw, is the first multimodal model that we're going to encounter in these lessons where it's embedding text and images in a shared embedding space. And that joint embedding space is very powerful since, as we'll see in this kind of worked example, we can actually use that to do semantic…
Contents
-
-
-
-
-
-
-
-
(Locked)
Topics51s
-
(Locked)
Components of a multimodal model5m 24s
-
(Locked)
Vision-language understanding9m 33s
-
(Locked)
Contrastive language-image pretraining6m 8s
-
(Locked)
Embedding text and images with CLIP14m 7s
-
(Locked)
Zero-shot image classification with CLIP3m 36s
-
(Locked)
Semantic image search with CLIP10m 40s
-
(Locked)
Conditional generative models5m 26s
-
(Locked)
Introduction to latent diffusion models8m 42s
-
(Locked)
The latent diffusion model architecture5m 50s
-
(Locked)
Failure modes and additional tools6m 40s
-
(Locked)
Stable diffusion deconstructed11m 30s
-
(Locked)
Writing your own stable diffusion pipeline11m 16s
-
(Locked)
Decoding images from the stable diffusion latent space4m 32s
-
(Locked)
Improving generation with guidance9m 12s
-
(Locked)
Playing with prompts30m 14s
-
(Locked)
-
-