From the course: Advanced Guide to ChatGPT, Embeddings, and Other Large Language Models (LLMs)
Unlock this course with a free trial
Join today to access over 25,200 courses taught by industry experts.
Case study: Visual QA—Setting up a model - ChatGPT Tutorial
From the course: Advanced Guide to ChatGPT, Embeddings, and Other Large Language Models (LLMs)
Case study: Visual QA—Setting up a model
- In our case, we are going to be creating our very own visual question and answering system. VQA for short is a type of architecture that takes in two types of data. Our system will be able to input both an image and a piece of text, usually a question about the image. Our system will process both image and text in tandem, combine those two pieces of information, use cross attention to hand that information over to our decoder, in this case, we'll be using open source GPT-2, to start answering the question given the image. Now this type of visual Q&A is seen as a precursor to some of the more frankly interesting tasks that researchers and practitioners are expecting from transformer-based architectures. It's all extremely well and good to have text to text models like GPT and T5 and BERT to a degree, although it doesn't output text, but the idea that we can start to cross data modalities, basically saying, well, it doesn't have to be just text. What if there's also an accompanying…
Contents
-
-
-
-
-
-
-
-
-
-
-
(Locked)
Module 3: Advanced LLM usage introduction3m 22s
-
(Locked)
Topics45s
-
(Locked)
The vision transformer2m 33s
-
(Locked)
Using cross attention to mix data modalities3m 16s
-
(Locked)
Case study: Visual QA—Setting up a model20m 41s
-
(Locked)
Case study: Visual QA—Setting up parameters and data18m 56s
-
(Locked)
Introduction to reinforcement learning from feedback12m 46s
-
(Locked)
Aligning FLAN-T5 with reinforcement learning from feedback21m 37s
-
(Locked)
-
-
-
-