From the course: OpenAI API for Python Developers
Whisper Audio API: Speech-to-text
- [Narrator] Whisper is a speech recognition system, released in September, 2022 by OpenAI as an open source software. And so Whisper is a general purpose speech recognition model and also a multitasking model, trained on a large data set of audio samples that can perform multilingual speech recognition, speech translation, and even language identification. Let's listen to one Whisper example and we're going to see the transcripts below. (computer narrator reads passage indistinctly) So this is talking very fast. Let's listen to one French example and I'm going to reveal here the English transcripts. Let's listen. (computer narrator speaks in French) And the speech to text API provides two endpoints, transcription and translation. Speech to text can be used to transcribe audio into whatever language the audio is recorded. And the other option available is to transcribe and translate the audio from a foreign language like Italian or Spanish into English. So as part of the project, the starter project, you're going to find here instructions on the README files in order to get started smoothly. So what you need to do is to install the Whisper model and the other required packages right here above and you may need to install also here, ffmpeg, which is required for the Whisper library. So I am providing you with instructions in order to install on different operating systems. And as part of the project, I am also providing with audio files samples with different file types with wav and MP3. And on the documentation you're going to see that here you have a limitations for every file upload. So your limit is to 25 megabytes for every file upload. And the file types that are supported currently are mp3, mp4, and also wav files. And for the audio samples we are using mp3 and also one file with the wav extension. So this is also recommended if you want to handle and process larger files. I am recommending you to look at this Whisper model. So we're going to use this one actually. You're going to find instructions as to how to install it. You're going to find the exact same instructions on the README files, and basically you're going to have here available different language models depending on the size of the files that you want to process. So we're going to use this one for our next example. So with that, let's begin our demonstration with the OpenAI Audio API. You'll be amazed by how simple it is to integrate this AI driven feature into your application.
Contents
-
-
-
-
-
-
Introducing the Moderation API1m 52s
-
Add a moderation layer4m 56s
-
Text to image: Introducing the DALL·E model2m 24s
-
Generate creative art with DALL·E4m 46s
-
Create an image gallery with DALL·E4m 35s
-
Whisper Audio API: Speech-to-text2m 40s
-
Whisper Audio API: Transcribe audio samples5m 5s
-
Whisper Audio API: Translate audio sample4m 46s
-
-
-
-