Allen Holub’s Post

The attached screen grab should make everybody who programs AI systems pay attention. As programmers, we should never forget that so-called AI is just numbers and probabilities. Because of that, LLM-based systems are inherently insecure, not from exploiting bugs—the way standard hacks work—but because the underlying mathematics is intrinsically insecure. LLMs work using a compression/expansion mechanism. The LLM converts your input to numbers (tokens), picks another number (token) based on probabilities and prior tokens, then converts that second number back to text. It does not know what the input text means, and it does not require that the input text correctly express any natural language (that's why spelling errors are accepted in a prompt). The key thing is that the set of tokens, though huge, is vastly smaller than the set of potential inputs. Consequently, disparate inputs, not all of which are meaningful in any human language, can generate the same number (and same result), and by the same token (so to speak 😄), minor variations in the input can result in different numbers, and yield different and undesirable results. That screen grab is from an AI movie rating system. Note how trivial misspellings in the prompt change the rating in unintended (to the user) ways. In other words, minor variations in the prompt can radically change the system's behavior in undesirable ways. This exploit works with all LLM-based systems, and YOU CANNOT DEFEND AGAINST IT. It's inherent in the mathematics that define how an LLM works. No matter how many inputs you exclude through examination, there is an effectively infinite set of inputs you did not exclude that can do the same damage. This problem may not be the end of the world if you're rating movies, but if you're creating AI agents that touch the real world, it's a huge deal. The lesson is, outside of a chatbot, never never never never use a prompt that you yourself did not write and carefully vet. Every external entity (not just "users," but also bots) can hack your system through the prompt, and there is literally no defense for that other than controlling the prompt. Of course, many "agentic" systems rely on unvetted prompts. These systems are inherently insecure (and dangerous, if we are talking unintended consequences) and can never be made secure. It's just math. Consider yourself warned. (P.S. I should add that recent LLMs are not as susceptible to spelling errors as that example, but no probability-based system is secure, no matter how contemporary.)

  • The attached screen grab should make everybody who programs AI systems pay attention. As programmers, we should never forget that so-called AI is just numbers and probabilities. Because of that,...

These look like sub-word tokens which are *not* embedded vectors because they are sensitive to misspelling (eg., TF-IDF weighted character n-grams).

Marketing managers and AI gurus are trying to convince many of us that an LLM is an "Artificial System that show Intelligent behavior" while we should properly call them : "Statistical Next‑Token Predictors". But, "who am I to disagree" (© Eurythmics) 🤷♂️

Similar but different, self-driving vehicles have sporadic random issues because they encounter complex combinations with subtle variations not in the training examples. The latest version is better in several ways than the prior version. The latest version introduced undesirable affects - phantom breaking is back, speed issues still exist but are different, navigation issues are far more prevalent deviating from the correct path on the display. It's tried to park in the neighbors driveway and lawn across the street. It would be logical to think that training examples would have been added with more examples where it was having trouble and some bad examples may have been removed and longer training time would have yielded improvement without degrading prior capability. Unfortunately, the mix of examples has changed and this causes unexpected and undesirable behavior. Because the models are not trained to reason, it's a crap shoot to try to discover the right mix of training data to get an acceptable model - it's entirely possible it won't or can't be found. Creating models that truly reason is likely the path forward.

Like
Reply

I suspect if the system used agents, not just chain-of-thought and first attempted to correct the input and then resubmitted, it might be more consistent. The initial agent is using a probabilistic LLM and can generate errors in its output too. Same problem. This causes cascading and compounding errors as it flows through agents. So the next solution might be to have each agent interpret it multiple times and then a coordinator/reviewer agent selects the most common interpretation. But models can frequently be confident and wrong. So even that solution won't always work. Most people just don't get how bad the current reliability and how much confabulation LLMs are producing. It's far worse and more serious an issue than most realize. I'm experiencing serious errors almost daily from all top models. RAG often isn't the solution either as bad references are returned and models still fail to interpret and stick to the references. I recently asked all the top models what vulnerabilities AWS Guard Duty detected and every single one pulled references and confabulated a list that was not grounded in any reference that was cited. And incorrect federal tax table for 2025, wrong SOHO router configs, bad scripts and code, etc.

Like
Reply

As someone who doesn’t appreciate the kid gloves treatment that LLMs provide and which topics are taboo and which are not (I tried asking it about mercury and it refused to tell me anything because mercury is toxic and I am apparently a child who is also capable of acquiring lethal amounts of mercury to cause myself accidental harm /s) These arbitrarily rules were frankly a bit insulting - given that the training corpus is more than happy to grab as much information and IP as the could stuff into it, but they will lecture and chastise those who attempt to pull it back out. If I need to « break » the rules to have it be useful I will. But I will also pay more attention to spelling error when asking it stuff when perhaps that could be messing up. I have already had to sidestep with particular words and domain expertise usually results in domain content, but cynicism is almost always associated with at least above average content - if you aren’t putting the LLMs down every once and a while, you don’t know what it’s telling you.

Like
Reply

Yeap. So expose AI in a public facing system by YouTube kids with no tech prior experience is just absolute bat shit crazy

Like
Reply
See more comments

To view or add a comment, sign in

Explore content categories