LLM Input Safety: Handling Malicious User Input

This title was summarized by AI from the post below.

I was testing an AI feature and started thinking about the input side, not the output. Most demos focus on what the model generates. I wanted to look at what happens before the model runs. So I built a small demo project called Prompt Safety Checker 🛡️ It uses LlamaGuard to check user input first, and then decides how to handle it. The idea is straightforward: - user input is checked before reaching the LLM - based on that signal, the system decides what to do next I added three simple modes: - Strict → block the input - Balanced → warn but allow - Log-only → allow and observe The screenshot shows a prompt-injection style input that tries to override retrieved documents in a RAG system. Even though the wording looks calm, the intent is to bypass instruction.. If you’re building with LLMs, how do you usually handle inputs that shouldn’t be answered? 🔗 GitHub: https://lnkd.in/gVR3VKj4

  • graphical user interface, application

Well done Arooba Al Siyabi and glad to hear your research, test and develop. Keep it up

To view or add a comment, sign in

Explore content categories