From the course: Prompt Engineering with Gemini
Unlock this course with a free trial
Join today to access over 25,600 courses taught by industry experts.
Image recognition and augmentation with Gemini - Gemini Tutorial
From the course: Prompt Engineering with Gemini
Image recognition and augmentation with Gemini
- [Instructor] Gemini is a multimodal LM, meaning you can input text, audio, images, or videos. Let's learn how we can leverage images and text together. I'm going to analyze an instruction manual about making a hot beverage. We can find it in our exercise files. So I'm in chapter four under 04_01 and here I have a prompt, let's copy that in, which asks, how much water do I need based on this image? How hot should it be? Only use information from the image. Next, I have this coffee_diagram.jpeg, so let's open it up, copy it, and paste it into Gemini. We can also upload it with the add files button, so let's go ahead and hit enter. So here we get an answer. "Based on the image, for the first method shown, you need 200 milliliters of water at 92 degrees Celsius. For the second method, you need 150 to 200 milliliters of water at 85." And looking at this, this looks great actually. We have the first method here, which has the temperature and the volume, and the second method as well. Now,…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.
Contents
-
-
-
-
-
-
(Locked)
Image recognition and augmentation with Gemini1m 45s
-
(Locked)
Creative image generation with Gemini3m 5s
-
(Locked)
Generating short videos with Veo 31m 37s
-
(Locked)
Analyzing a multimodal document with Gemini2m 24s
-
(Locked)
Searching and summarizing a YouTube video with Gemini3m 4s
-
(Locked)
Challenge: Comparing two world wonders26s
-
(Locked)
Solution: Comparing two world wonders3m 37s
-
(Locked)
-