From the course: OpenAI API: Building Front-End Voice Apps with the Realtime API and WebRTC
Testing the Realtime API in the Playground - OpenAI API Tutorial
From the course: OpenAI API: Building Front-End Voice Apps with the Realtime API and WebRTC
Testing the Realtime API in the Playground
- [Instructor] Before we dive into code, I recommend playing around with the realtime API in OpenAI's playground. You can find it at platform.openai.com. Here you get to experience and experiment with the API to understand what it's like to talk to the system instead of using text and see how the different features respond. Now, before you start working, I recommend setting up a project for the current project you're working on. You do that by going up to the top corner, selecting the projects tab, and then either selecting an existing project you're already working on or creating a new project. That way you can save settings for that project and you also see how much money you're spending in the individual projects. Later on when we start creating API Keys, we'll also assign those API keys to the same project, so everything is contained. In the playground, you get direct access to all the main features of the real-time API, and we can play around with them to see how they work in real life. That all starts down at the bottom. First, make sure the system has detected a microphone that actually works on your computer system. When it does click, start session. This creates a new session between your computer and AI. And when I now unmute a microphone, I can give in a command. Hey, can you write me a haiku about a duck? - [Narrator] Certainly, here's a haiku for you. Winter's chill descends a duck glides on still water. - [Instructor] So what you're seeing here is the system is responding exactly like the text-based system. And what you also notice is I started talking while the system was talking and it detected that and stopped talking. This is the interruption feature that you get with the real time API. You can talk in a very natural way to the system and it can detect that you're talking and stop talking to listen to what you're saying. Now, of course, this instruction that I'm giving it is meaningless, so I need to tell it something to actually do. Let's see, what is five plus 11? - [Narrator] Five plus 11 equals 16. If you have more questions or need more math help, just let me know. - [Instructor] Okay that's one thing, here's something else. The realtime API takes both voice input and text input. At any time, you can activate the text input and put in some information. - [Narrator] Adding 23 to 16 gives us 39. Need help with anything else? - [Instructor] And as you can see, it responds in both voice and text. So when you set this up in code, you can instruct a system to say, I want text input and a voice output, or I want a voice input and a text output, or I want text in and text out. It's up to you, and lastly, look at this. - [Narrator] We were talking about a duck. Anything else you'd like to? - [Instructor] So what you're seeing here is the real-time API works like the assistance API in that it has memory inside the session. The entire session is stored in the system and it's able to look back at what happened previously. And it can do that from voice or from text. So unlike the regular chat API where you have to provide the memory lookback feature and provide the context that you're talking inside, in the real time API, you have the context, but that also means if you close out a session and then start a new session and say, "What animal were we talking about again?" - [Narrator] Could you give me a bit more context? - [Instructor] The system has no idea what I'm talking about because we're in a new session and it's not able to look back on the memory of previous sessions.