From the course: OpenAI API and MCP Development
Audio: Building a voice-enabled chatbot
From the course: OpenAI API and MCP Development
Audio: Building a voice-enabled chatbot
So this is the final result for a previous challenge, which was to create a multi-model Q&A assistance, which is capable of processing different types of generative tasks, including generating contents and also generate images from scratch. You see that we can switch between the different features. So we have this nice, clean UI that we have built using Streamlit. On the left, we have settings that allow users to switch between different models, to select one which is faster, better, to process more complex tasks like generating code, for example. And also you can select from a list of models to generate images. And you can add daily three if you'd like to complete this list. So let's try this one out for now. So we're gonna ask to create a short story, a one-sentence story about a princess and a bird. We're going to stay in the magical and fictional category. So let's hit send and see the results. In a kingdom where silence reigned, a lonely princess discovered her voice when a feathered songbird taught her the melody of the winds. So now let's try something to complete this task. We're going to also try to generate an image for the same story. Generate, generate an image about this story. In order to complete the task and have a story and an image that corresponds to that story. Here we go. So now we have the result. Right below, we have an image that corresponds to the story that was just generated. So that looks very nice. So now let's try to make the demonstration even more interesting. So what we want is to transform this AI assistant into a speaking assistant. We want to add vocal features and then allow to add a clear and natural voice to it instantly. And no recording studio is required. So let's go check out the OpenAI documentation. So here we can discover that the API provides a wide range of audio capabilities. And one of the most exciting is text-to-speech that allows to turn text into a lifelike spoken audio. And we can test one right below. So this one is called Alois. And this is... The sun rises in the east and sets in the west. Let me stop it. So this is a voice that sounds very natural, like a human. And that was generated using AI. And you have a list of voices that you can select from. So you have different options. All right. And you can even control the sentiments, like apply some emotions to the voice. So you have different options. You have different formats that are supported, and also languages. So let's actually see the examples that we can apply to our next example. What we want is to build upon the last challenge, solution, and example, and make it even more interesting to create a speaking AI assistant. So let's go back to the corresponding project. So now you have for now this starter project that we're going to develop. And right at the top, you have a few functions, helper functions, that were added to the scope of this project, so line 2. So let's discover. So here you see that we had already set up here the helper functions to generate content using the chat completions endpoint, line 20. Then the other endpoints to generate images using a text input, line 36. And finally, below line 47, this is where you find the helper functions with the audio API to convert from text to speech. So this function is called tts2mp3file, and we're going to use this one. So what we want is to then get in return a path, a file path name to then listen to the audio. And you'll see that we have this directory. This is where we're going to save and store all the audio files that we're going to create. So for now, you know, I'm just going to delete this one because in the same function, we allow to recreate this same directory in the case that this is not existing. So if this is none, line 64, it's going to be recreated. All right, so I have just deleted it, just for the demo. It's going to be instantly recreated. So I'm going to go right below here, so this tab chats. And we're going to test this text to speech feature with the content generation task. So this is where I'm going to use this function. And I'm going to pass the completion, meaning the written content that was generated. We can specify the model. And actually, that's going to be the same. Let me go back. So I'm just going to add to TS model. And then here, I'm going to specify the file that I'd like to create, like this. And I'd like the file name to be always the same, just to always replace the same file, because this is only temporary file. so I'm just going to allow to save, replace, and save using the same name. And for the name of the model, I'm just going to go back to the documentations to then copy here. And I want to use this model to then convert from text to speech. All right, so let's try that. Once it is done, so we're going to have access to then a file path name. So that's going to be a temporary file path. So I'm going to name it this way. Then we're going to add and use this feature from Streamlit, allowing us to then access to a player to then play the audio. So that's going to be with a temporary file name, and that's going to be an MP3. So let's try that. All right, so let's start the app again. Let's try something very quick, which is to write a story, a one-sentence story about a princess, about a young princess, I'm going to say. We're going to hit Send. All right, so we have the text first. And here we have the same text, which is going to be, now we have the same text that we can then listen to, so I'm going to hit Play. A young princess discovered that her laughter could summon the stars, igniting a forgotten magic that would change her world forever. Excellent. In a kingdom where dream. Excellent. So that reminds us of a previous demonstration with the audio features using OpenAI as well. And you'll see as well that in this TTS directory, you have this file, which is stored. And that corresponds to the text, the written text that was converted into audio, now available with this audio file. So instead of having to play this, what we would like is to actually autoplay. So this is another function that I have made available in the handlers file. So where you find also the handler and helper functions. So you just need to actually use this one, which is enable audio autoplay. And instead of displaying this audio player, we're going to keep it hidden. So by just using HTML. So this is already done in the background. So this is already set up, so you don't need to bother with this part. What we want is to start this function to allow to open the file and then play it automatically as soon as the text is generated. So instead of having this, I'm just going to directly auto-play and pass completion as a Power Render. That's all there is to it. So let's try that. And we're going to do the same, actually, Whenever we generate an image, we're gonna take the text from the user input actually. We're gonna read, here is an image for whatever and then whichever request was made by the user. All right, so whenever you generate an image, you're also gonna have something which is gonna be played out. Okay, so let's run again. Okay, so let's try another story, so we're going to find some inspiration. So let's say we want to write a story, one sentence because we want to make it short. One sentence story about a knight and a princess in a forest. We're going to be precise and detailed about the location. Okay, let's hit send. All right. As the moonlight filtered through the ancient trees, the brave knight bravely rescued the princess from the clutches of a fierce dragon, their laughter echoing through the enchanted forest as they made their escape. All right. And this is also possible with the images feature. And also as a reminder, you have access to different types of voices. So if you'd like a voice to sound more like a male voice, you can always switch here and select this one. For example, you can always try them all and see what is the best. So, select the one that sounds the best to you. And let's go back, and then you can replace here, go to the handlers, or you can, I think that I have also made the options to actually allow you to specify it whenever you call this function. So you can actually specify it, so whenever you run TTS, and that's going to be actually done in this part, because whenever you call AutoPlay, so this is where you're going to specify it. So you can always replace it here, and update in order to make this one switchable so very easily. So that's one option. So if you'd like your assistant to sound like a male voice.