From the course: OpenAI API for Python Developers
Whisper Audio API: Transcribe audio samples
- [Instructor] Now let's do our first demonstration with audio transcriptions. And here we have a Streamlit app with the options to upload one file. And when you click on the submit button, it's going to run here the process of transcribing your file. So let's look at our project. So mine is already up and running. You're going to run your project with streamlit run main.py. And mine is visible here under this location right here. And basically what you can do is then click to grab your file. So that's going to be an English-testimonial. We can run it. - [Customer] Our experience with Connecticut was nothing but positive. - [Instructor] And what we want is to get the transcript for this English-testimonial. All right, and for now, so if we click submit, so there's nothing happening, and you'll see that as part of the project, there is also this speech_to_text that you're going to find under utils. And this is where you're going to set up here the model, initialize the model, and then run the transcription. So let's go find out the documentations for the whisper model. So this is also provided on the README files for the instructions of how to install the whisper model. Basically, you're going to run this and also this here command to install the whisper model. And also this is required to install this. So you have the instructions here as well. And the next step will be to initialize the language model that you want. And here you have the list of the available models. We're going to use this one for our next example. Let's find the code snippets. For our example, that's going to be right here. So first, you need to import the whisper model, load the language model, and run transcribe. Let's copy that and go back to our project. I'm going to add it right here. So mine is already imported at the top of the file right here at the top. So the first step is to initialize the language model. Next, we're going to run this transcribe method. And finally, we're going to print the result object. And that's going to be the first step. And of course, here we're going to need to replace, I have a typo, that's going to be results. Good. And of course, here you're going to need to replace with audio_path and that's going to be here that we're going to define the audio_path, which will be a temporary file path right here. So we're going to allow to run this, which will be speech_to_text and pass temp_file_path. So this is going to be whichever file path which is created after you upload your file. So let's try that. So we're going to make a quick first demonstration, which is to upload this English-testimonial audio sample. We're going to click submit. And here we're going to see that this is being transcribed. And when it stops, you can then go back and you'll find here the results. And actually this is this big object. And inside, you're going to find this, which is the text, and that corresponds to the transcript of the audio sample. And so the next step is to actually display it here on the user interface. So let's do that. Let's go back and we're going to change a few things. So now it's not going to be just printing, but we're going to return the result object. And that's going to be, as we have just seen, that's going to be the text key. And let's go back. So we want to get, so that's going to be transcript. And we're going to use this. We are going to print something like success to say like just to indicate that it was done properly as expected. So instead I'm just going to say something like file transcribed successfully. And so the next step will be, I'm going to add also a divider just to add some room, some space. And next, we're going to actually write, so the text, we're going to allow to actually print it in blue with this syntax. So that's going to be printed in a blue color. And I'm going to remove this. I don't really need that. And also the other step is that we're going to allow also to listen to the actual file that we have uploaded so we can read and also listen to the audio sample. So let's try that next. Let's take again this English-testimonial. So that should run smoothly. All right, so now we can see that it was successfully transcribed. We can read the transcript and also we have the options to listen to it. Excellent. This is working perfect. And let's try with another sample. Let's take this one, a French-testimonial. Let's listen to it. So we're going to process this file and see what happens. All right, so now we can see that it was also successfully transcribed, but you can see that this is actually transcribed in French, so it did the job correctly. Excellent. But actually what we would like is to be able to transcribe and also translate. So let's see next how we can use this endpoint, which is to allow to transcribe an audio sample and also translate it into English.
Contents
-
-
-
-
-
-
Introducing the Moderation API1m 52s
-
Add a moderation layer4m 56s
-
Text to image: Introducing the DALL·E model2m 24s
-
Generate creative art with DALL·E4m 46s
-
Create an image gallery with DALL·E4m 35s
-
Whisper Audio API: Speech-to-text2m 40s
-
Whisper Audio API: Transcribe audio samples5m 5s
-
Whisper Audio API: Translate audio sample4m 46s
-
-
-
-