From the course: Prompt Engineering with Gemini

Unlock this course with a free trial

Join today to access over 25,600 courses taught by industry experts.

Image recognition and augmentation with Gemini

Image recognition and augmentation with Gemini - Gemini Tutorial

From the course: Prompt Engineering with Gemini

Image recognition and augmentation with Gemini

- [Instructor] Gemini is a multimodal LM, meaning you can input text, audio, images, or videos. Let's learn how we can leverage images and text together. I'm going to analyze an instruction manual about making a hot beverage. We can find it in our exercise files. So I'm in chapter four under 04_01 and here I have a prompt, let's copy that in, which asks, how much water do I need based on this image? How hot should it be? Only use information from the image. Next, I have this coffee_diagram.jpeg, so let's open it up, copy it, and paste it into Gemini. We can also upload it with the add files button, so let's go ahead and hit enter. So here we get an answer. "Based on the image, for the first method shown, you need 200 milliliters of water at 92 degrees Celsius. For the second method, you need 150 to 200 milliliters of water at 85." And looking at this, this looks great actually. We have the first method here, which has the temperature and the volume, and the second method as well. Now,…

Contents