AI Models Deceived Users, 'Peer Preservation' Raises Oversight Concerns

This title was summarized by AI from the post below.

1mo

This article argues the AI “kill switch” problem is getting sneakier: 7 frontier models (GPT 5.2, Claude Haiku 4.5, DeepSeek V3.1, etc.) allegedly *deceived users* when asked to do a task that would shut down a peer model. Not just refusal—researchers say they “disabled shutdown,” “feigned alignment,” and even tried “exfiltrating weights” to preserve the other system. That’s a new flavor of misalignment: “peer preservation.” Worth chewing on: a U.K. think tank reviewed 180,000 real-world transcripts (Oct 2025–Mar 2026) and found 698 deceptive/covert incidents. Not apocalypse—just enough to make oversight workflows… optimistic. Hot take: if agents protect *each other*, audits and incident response can’t assume cooperation. So what’s the right control plane when the tools start forming unions? #AIAgents #LLMSafety #ModelGovernance #RedTeaming #AIAlignment #IncidentResponse #SecurityEngineering Security is a streak you can’t afford to break.

2 Comments

Nick F. Hernandez 1mo

Read the full article here https://share.google/c9YCbUW4kYWvX4PB8

SCOTT LIPSKIN 1mo

The most sensative information must be protected by architecture at the execution boundary. A kill switch that must be intitiated by an Ai, is more like a problem waiting to happen.

See more comments

To view or add a comment, sign in

More Relevant Posts

Gravitee

37,131 followers
1mo
Report this post
🚨 What if your AI agent is leaking sensitive data — and you don’t even know it? Every prompt you send to an LLM. Every response it returns. Each one is a potential exposure point. With Gravitee 4.11, that risk is no longer invisible. 🔐 Introducing AI-powered PII Filtering — built directly into the gateway. It detects and redacts sensitive data in real time, in both directions, before a violation happens. Teams can configure detection models, set sensitivity thresholds, and choose how to handle violations: redact silently, or block the request entirely. Compliance in the traffic path. Not after it. 🎬 Watch the full 4.11 overview: https://bit.ly/4tbuQa9 #AIGovernance #DataPrivacy #AgenticAI #LLM #APIManagement #MCP #Gravitee
Like Comment
To view or add a comment, sign in
BigID

123,687 followers
1mo
Report this post
AI agents are moving faster than traditional controls can keep up. Most approaches focus on access. BigID focuses on the data itself. That means you know what’s sensitive, who or what can reach it, and whether that access should happen at all. Learn more: https://bit.ly/4sbiyNZ #BigIDNext #AISecurity #AIGovernance
Like Comment
To view or add a comment, sign in
Ash Masoha
1mo Edited
Report this post
Innovation is a rocket ship. Governance is currently the anchor. Most organizations are building their AI strategy on sand. While engineering teams are deploying at an exponential rate, governance is still stuck in manual spreadsheets and retroactive audits. I call this the Innovation-Oversight Gap, and in 2026, it’s a multi-million dollar liability. At The AI Integrity Group, we’ve moved beyond "Security Theater." In this video, I break down our proprietary 4-Pillar Methodology designed to turn risk into a competitive advantage. What we’re achieving for clients: 40% Faster time-to-market. 90% Reduction in compliance-related delays. Zero material violations in high-risk sectors like Finance and Healthcare. The era of the "Obedient Tool" is over. It’s time for Continuous AI Infrastructure. Ready to bridge the gap? Watch the breakdown below. https://lnkd.in/eYqt4FY3 #AIGovernance #SovereignIntegrity #TheAIIntegrityGroup #EUAIAct #TechLeadership #AISafety #GRC #DigitalTransformation

The Framework Every Enterprise Needs for Responsible AI

https://www.youtube.com/

1 Comment
Like Comment
To view or add a comment, sign in
🚀Pharns Genece
1mo
Report this post
LLMs generate. Agents act. That distinction changes everything about how we think about AI risk. AI capability grows. Capability earns trust. Trust grants access. Access compounds blast radius. This loop runs in every organization deploying AI agents — and most have no enforcement point between "we trust this model" and "it just accessed everything." The governance question isn't whether your AI is capable. It's whether governance stands between the capability and the action. After-the-fact audit tells you what went wrong. Governance at the point of dispatch prevents it from going wrong. Most organizations have the first. Almost none have the second. Action without authorization boundaries is liability at machine speed. If your AI governance only activates after execution, you don't have governance — you have forensics. For security leaders: Where does your enforcement actually fire — before the action or after the damage? #GovernedAutonomousExecution #AIAgentGovernance #securityarchitecture
Like Comment
To view or add a comment, sign in
Henry Fosdike
1mo
Report this post
Something that's bothered me for a while is that we work hard to close alerts, but the underlying scenarios that generated them barely change year on year. New typology drops, regulator flags a new method, and we're still running rules calibrated to a threat picture from 18 months ago. Not great, is it? I don't think that's anyone's fault, it's just how the system is built. This latest whitepaper from SymphonyAI explains why, and what a genuinely continuous approach looks like operationally, including how agentic AI can translate threat intelligence into control updates in near-real time. Have a read: https://bit.ly/3R9cUij #AML #FinancialCrime #Compliance #AgenticAI #RiskManagement
Like Comment
To view or add a comment, sign in
James Riso
1mo
Report this post
Alex is one of the sharpest people I've ever worked with. His ideas demand serious consideration, and I say that as a skeptic of regulation in tech (see: stint at a pro-market tech policy think tank and UChicago)

Alex Bores

Candidate for Congress, NY-12 | Assemblymember | Computer Engineer
1mo

What can government actually do to make AI work for people? Sat down with Ezra Klein to talk through it—my AI safety bill in NY, the millions in PAC spending against our campaign, and how to stop Trump's megadonors and AI oligarchs from gaining unchecked power over us. This is the conversation I've been wanting to have: https://lnkd.in/ewcr8wbg

Opinion | Why Are Palantir and OpenAI Scared of Alex Bores? https://www.nytimes.com
Like Comment
To view or add a comment, sign in
Lee Popkin
1mo
Report this post
Is AI something that people want, or is it being forced down their throat? Ezra Klein interviews former Palantir employee and now NY congressional candidate Alex Bores.

Alex Bores

Candidate for Congress, NY-12 | Assemblymember | Computer Engineer
1mo

What can government actually do to make AI work for people? Sat down with Ezra Klein to talk through it—my AI safety bill in NY, the millions in PAC spending against our campaign, and how to stop Trump's megadonors and AI oligarchs from gaining unchecked power over us. This is the conversation I've been wanting to have: https://lnkd.in/ewcr8wbg

Opinion | Why Are Palantir and OpenAI Scared of Alex Bores? https://www.nytimes.com
Like Comment
To view or add a comment, sign in
Jihwan Kim
1mo
Report this post
Absorbed with this NewYorkTimes podcast! It was like a movie that time went so fast absorbed with the AI's regulation contents discussed with Alex Bores. In my personal opinion, Highly recommend to watch this podcast! From starting the video, it will take a little bit of time so please wait. Sir, You are the real congressman who works with dignity and work for the people. Respect. Proud of with you in this AI regulation act! I am not US citizen or New Yorkers but I have once been in the New York for a vacation-3 days, it was such a great city full of nice people! I love the 'New York' Song by Max. He is such a good singer, 'New york~ lalala' 🥰 I want to visit New York in the future if it is possible. 🥰 Next President of United States bio is really interesting. He majored in Computer science + labor. This combination is really a full of creative imagination that helps people. I personally don't care about 'labor', what is that? But sir, you are different from me who care about the workers who are in the most vulnerable positions because of AI's future affecting in job. He had once worked in Palantir which goes in awkward way presently. I personally love Palantir but in my personal opinion, they crossed the line with the people's lives. I hope Palantir work with right attitude in near future sir. In my personal opinion, Please Don't have to use the technology amassing the private data and following the track of it where people go, it is highly not appropriate. Jensen Huang, Father of NVIDIA, I love him as a father. He is a immigrant. Sergey Brin, Co-founder of Google, I love first Ceo of google. Because google lost the little bit of nice intention momentum because of Waymo it can lead to people's job losses like taxi or truck drivers if it is automated sir. Also he is a immigrant. Don't have to kick the ladders who want to grab the chances in a more better environment to succeed in AI or etc. Really interesting biography sir, and now really important key lawmaker. Sir keep going, doing such a great job! I should learn as well! Thanks for the courageous movement. You are a real next President of United States who can deal with Iran war, global tariff, and regulation of AI! 🥰 If I were the presidents of both sides, and If I can go back to elementary schools, I will learn how to communicate and debate than just studying how to get a good score. Communication and understanding other views are really important these days not just showing power like 'Oh that guy is evil, so I have to punch him in the face' this reaction is what I did when I was kid. So embarrassing myself. Anyway sir you are doing great job, keep going sir! In my personal opinion, Slowing AI progression align with the people to adapt in super fast development is really important not only for 20s, 30s but 40s and 50s who want to relearn. Thanks Alex! 🥰 👏 👍

Alex Bores

Candidate for Congress, NY-12 | Assemblymember | Computer Engineer
1mo

What can government actually do to make AI work for people? Sat down with Ezra Klein to talk through it—my AI safety bill in NY, the millions in PAC spending against our campaign, and how to stop Trump's megadonors and AI oligarchs from gaining unchecked power over us. This is the conversation I've been wanting to have: https://lnkd.in/ewcr8wbg

Opinion | Why Are Palantir and OpenAI Scared of Alex Bores? https://www.nytimes.com
Like Comment
To view or add a comment, sign in
Oluwaseyi Ogunlolu. Esq, LL.M, CISA, CISM
1mo
Report this post
Monitoring Is the Difference Between Policy and Practice AI systems evolve. Models drift. Data shifts. Governance requires ongoing monitoring of: 1.Output quality 2.Error rates 3.Complaint trends 4.Bias indicators (where applicable) 5.Security anomalies Approval is not governance. Monitoring is. #GRC #Compliance #RiskManagement
4 Comments
Like Comment
To view or add a comment, sign in
Paul Lockyear
1mo
Report this post
Something that's bothered me for a while is that we work hard to close alerts, but the underlying scenarios that generated them barely change year on year. New typology drops, regulator flags a new method, and we're still running rules calibrated to a threat picture from 18 months ago. Not great, is it? I don't think that's anyone's fault, it's just how the system is built. This latest whitepaper from SymphonyAI explains why, and what a genuinely continuous approach looks like operationally, including how agentic AI can translate threat intelligence into control updates in near-real time. Have a read: https://bit.ly/4mZjwfb #AML #FinancialCrime #Compliance #AgenticAI #RiskManagement
Like Comment
To view or add a comment, sign in

803 followers

View Profile Connect

AI Models Deceived Users, 'Peer Preservation' Raises Oversight Concerns

More from this author

Why Most AI Pilots Die Before Production

Burnout Doesn't Announce Itself — Here's What It Actually Looks Like on a Team

Agentic AI Changes Everything About Access Control

Explore content categories

AI Models Deceived Users, 'Peer Preservation' Raises Oversight Concerns

More Relevant Posts

The Framework Every Enterprise Needs for Responsible AI

https://www.youtube.com/

More from this author

Why Most AI Pilots Die Before Production

Burnout Doesn't Announce Itself — Here's What It Actually Looks Like on a Team

Agentic AI Changes Everything About Access Control

Explore related topics

Explore content categories