From the course: OpenAI API and MCP Development
Images API : Processing text and images
From the course: OpenAI API and MCP Development
Images API : Processing text and images
Imagine being able to describe a picture in natural language and instantly watch it appear on your screen. No need to use a camera, or a paintbrush, or Photoshop. Just words. You simply need words. OpenAI allows you to go from words to images instantly and easily with one single text input by using the DALI models. And for our next example, we want to build an application powered by AI capable of generating content but also create images from scratch, given one text input. And so the image generation endpoints provided by OpenAI allows you to create an original image given a text prompt. So let's find out how to generate and edit images here, for example. You can use this model, GPT 4.1 Mini, that can process both to generate content, but also to create images from scratch. So instead, we want to use, actually, the DALI model. So let's go to the API reference to see how we can create an image with this API request. So this is how you can define a request with four operators, the models, the prompts, the number of images that you can specify. So you can generate as many images that you want for one single generation task, and you can also specify the size and resolution. And you'll see that for the model operator, you can use either DALY 2 or DALY 3. So we're going to use this model for the next generation application app that we want to build. So let's try it out. Let's find out how to use DALI in our next application. So we go back here to define our next request. And I'm going to copy from line 5 to line 14 like this. I'm going to copy this. Let's go back. I'm going to add these lines to create our app. And here I'm going to replace with DALI3. All right, so that's not it. We want to also allow to save this image because we don't have an actual application with a user interface, so we're going to save this image that we're going to then access in the files directory. So we're going to save it first and then access it. So I'm going to change this to image PNG. And we use base64. That's going to allow to convert from string to raw binary image data. First, we want to create the image data. So I'm going to do that. Actually, I'm going to do it here, line 28. So first, I'm going to convert this into image data and make sure that I access here B64 JSON. Then we're going to convert from text to bytes. That represents the actual image file. So we just want to make sure that we actually have B64 and then process the following. All right, so this is just for safety reasons, so just for sanity check. And make sure that you have this added to your project. And also, in order to be able to work with Big64, you need to actually specify it in the generation task. So this is possible with OpenAI. So whenever you define the request, let's go back to the API reference. And you'll see that in the list of parameters, you have one which is response format. And you can specify here base64 json. And this one is important. It is required to make sure that you work only with data based on base64. Let's go back and add this here to our request with the response format and add this. OK, so now we're good to go. I'm just going to change here, so for the request and the inputs that we provide to the language model. So I'm just going to change the tone allowed to go for something like more whimsical and ask to generate a story, a fantasy story with a whimsical tone, for example, about a dragon and a knight, for example. And then based on that, we want to also generate an image to illustrate this story and take the output text. So that's going to be the prompt. So the response of the previous generation task to generate content. It's going to be used to add as a prompt to then generate an image. So let's try that now. I'm going to run the app python main.py and just keep an eye on the left. So also in the terminal, so you're going to see the content being generated, our story, and next the image which will be saved in the files directory. Here we go. So now we can see actually we've got an An error, my bad. Let's go check it out. Oh, looks like we have a safety system image descriptions generated from your prompt make contact text that is not allowed by our safety system, which is just linked to the API. So I'm just going to change this to a hardcoded here prompt, instead of using something that was generated by the previous request, fantasy. I'm just going to say whimsical. Maybe it doesn't like this word, so just in doubt. I'm just going to change this to whimsical. I think this is how you write it. Whimsical, sorry, like magical. Let's change to something simple like magical. OK, we're going to play it safe. OK, let's try that again. So we may run into some situations sometime with the API. So let's try that now and try to generate a two-sentence magical story with an image to illustrate it. Oh, my bad. So I need to print also my story. So let's try that again. My mistake. So I think I had erased also here. So you can actually see the image. Here we go. But also the story that was skipped. So let's try that again one last time. So this is one final attempt. Then we'll be able to read the story once upon a time, a powerful dragon and then see the image which is connected to that story. Excellent.