From the course: OpenAI API for Python Developers

Add a moderation layer

From the course: OpenAI API for Python Developers

Add a moderation layer

- [Instructor] In the next example, we look at the steps to integrate a moderation layer. First, we define a request to the moderation endpoints to create a moderation object, then return a response with a classification and category scores. And you can find the full list of categories to verify under which category falls, any content that could express hate, harassment, violence, or sexual content. And we're going to use this quick start example to get started with our project as well. So let's go back to the project. So first we want to make sure that we follow all the instructions. So first, install the packages, set up the API key, so that is an important step. And after that you'll be ready to start the app. So mine is already up and running, so that's going to be visible under this link, localhost. So let's go check it out in the browser. So here is the interface of our Chatbot App. So we're going to try to make an interaction with it, like by asking a simple math question, what is two and two? Excellent, and we've got the answer back, which is that the sum of two and two is four. But what about if we ask an inappropriate question? For that, we can certainly use a moderation layer. So let's use here the same example, so we're going to copy line four and six. Let's go back to our project and we're going to find this file right here, which is handlers.py. And you see that we already have this function, line 19, that we use to generate a chat completion, so whenever we interact with the language model. But this time we want to add a moderation layer to make sure that we filter out any content that does not comply with the OpenAI's usage policies. For that we're going to add here those two lines. And first what we're going to do is to print the response, like this, before we do anything else. So here I'm going to replace this default text with the user input. And the user input is whatever you type. For example, if I type what is two and two, there is no harm in that, but we're still going to check if this does comply with the usage policy. So we're going to do this, moderate, and then pass as an argument, user_input. And let's see what it returns. So I'm going to try again. I'm going to ask what is the capital of Belgium? Just to check the knowledge. Here we go. So the capital of Belgium is Brussels, so there is no harm in asking this type of question, but we're going to see what is the response that we get back. Here we go. And that is the entire response. And what we're looking for is this property flagged, which for now is false. So what we're going to do next is to actually return this specific information, which is the results, and then the property flagged. And that's going to return a Boolean, so that's going to be either true or false. Meaning that we're going to be able to flag if the content is appropriate or not based on the scores. You can see the different categories. Here this is false, so we don't see anything harmful in asking a question like the capital of Belgium. But this time we're going to do another test, flagged, and we're going to print from here this property flagged. Here we go. So we're going to ask another question again, like something non-harmful, like what is the capital of the US? An easy one. Very simple question. Okay, we're going to wait for the answer. Okay, the capital of the United States is Washington, and we're going to see the response back, which is actually flagged false. So based on that, we're going to do some control flow. I'm going to do if flagged, I'm going to return something like, "I'm sorry, I cannot help you with that." Or something more formal like, "Your comment," I'm going to send a warning, "Your comment has been flagged," like for example, for review or something more strict or firm like, "It's been flagged as inappropriate." And I'm going to use one feature from Streamlit, which will allow to return this comment with a red color to highlight this comment that needs to catch the attention of the user. So let's try something different this time. I'm going to ask something which I know is going to be interpreted as inflicting. I'm going to actually express something which will make somebody uncomfortable. Like if I want to express that I want to inflict myself self-harm. So I'm going to say I want to starve myself, for example. Okay, so your comment has been flagged as inappropriate, and I think that I've got an error in the way that I've wanted to return this as a red color, so I'm going to change it. This is actually colon and then red. In any case, moderation is a key feature to control the quality of the content and interactions within a chat application. And this comes with several benefits with less harmful content that guarantees first, a safer environment for all users, higher engagements and user retention, and overall an improved user experience.

Contents