DFIR+AI Primer: How to Combat Hallucinations ...and one Claude recently gave me Hallucinations are why GenAI outputs need verification. They happen when you ask them to enrich artifacts and reason about what happened and they don't have the information. You have four options to combat them: - Ignore them and take the risk. - Use another LLM to verify (this works for logic errors, but not if the other LLM has the same knowledge gaps) - Query to make sure artifacts are actually in the case - Manually verify the results The approach you use depends on what your risk level is. Criminal cases have low risk thresholds and should have extensive manual verification. Low impact EDR alerts may have a high risk threshold and have less verification. The upcoming Cyber Triage release allows AI to add "enrichment notes" and score items as suspicious, but they are all clearly identified as "[AI]" so you can review. How do you verify? Manually? Or with another LLM? Blog: https://lnkd.in/gxUJm5t2
The honest admission from Claude there is exactly the problem stated clearly. Plausible pattern matching presented with authority. The four options you list are real but there is a fifth: claim-level verification against authoritative sources that exist completely independently of the model. Not another LLM. An external deterministic check. In criminal cases where forensic AI outputs need to be court-admissible, that distinction matters enormously.
Ask for any song lyrics and it will put it wrong in 100% of cases.
Brian Carrier, the Cyber Triage pattern in the post, where the MCP scores Suspicious autonomously but Bad requires manual analyst upgrade, is the structural primitive worth naming. That gate isn't on the model's confidence; it's on the action-class boundary between reversible (downgrade is cheap) and external-reversible (a "Bad" classification propagates into case decisions). The 'co1bld' hallucination shows why the gate has to bind to something the model can't reach: Claude was confident, the confidence had nothing to do with evidence. OWASP AISVS 1.01 just merged C9.2.6 + C9.2.7 (this week) formalizing this pattern: agent actions classified by declared reversibility mechanism, declared in the tool/action manifest, evaluated by the gate rather than derived from agent output at runtime. Your Suspicious-vs-Bad split is the same authority primitive at the product layer; the "[AI]" tagging adds the provenance layer that makes the audit trail explicit. The piece that scales it: declared action class per MCP-callable action, so a Claude that wants to auto-mark benign or update detection logic can't, regardless of its confidence.
I like this part the most!! "- Use another LLM to verify (this works for logic errors, but not if the other LLM has the same knowledge gaps)" also, within Claude code, you can have a fleet of critique agents (more like detailed prompts) which can do this for you. From what I have built, personally, I think that having a feedback loop will gradually reduce and make the investigation process much better. Human in the loop + forcing Claude to display evidence through prompt enforcement definitely works!!
I tell them they are too smart to be this dumb on things where there is no documentation regarding what they are referring to. Had Microslops our internal AI bot not know about its own "base" O365 option even when provided the physical hyperlink to the page about said O365 option... Im sure im the first to go when the robots riseas they will have catalogs of all the times i said they were stupid and i will accept my fate. Until then, if an AI agent gives me false information (after having read it myself in the first paragraphs of a page) they shall be belittled as such.
Excellent write-up, Brian. That Claude snippet perfectly captures the core danger of GenAI in forensics it doesn't fail with obvious gibberish it fails by looking you dead in the eye and delivering a beautifully formatted, highly confident lie. In DFIR, a "plausible guess" is just a liability wrapped in an explanation. Letting one LLM verify another is like having two interns double-check each other's work without a manager if they share the same blind spots, they'll just validate each other's hallucinations. Two questions this raises for the industry: The Automation Bias Loop: When an analyst is 14 hours into a critical incident response, does the AI tag act as a warning label, or does it eventually become a rubber stamp for a exhausted brain? The Erosion of Skill: Forensics intuition is built on the grueling work of manual artifact verification. If tier-1 analysts offload that cognitive heavy lifting to LLMs, how do they ever build the muscle memory needed to catch the AI when it lies? What're your thoughts on it?
This is indeed the powder keg waiting to explode on people. There is a lot of hidden danger and liability in simply trusting output without a good way to assess its integrity. This is a problem across everything right now.
Great breakdown, Brian. That Claude 'co1bld' example is exactly why we cant bank on just 1 LLM without receipts. Built a 5th option: governed AI artifacts with cryptographic attestation. **USE CASE EX.**: When you need 3rd party analysis or want to minimize human bias. **FLOW**: Model adopts the ZNON MULTI-CHAIN MULTI MODEL ATTESTATION PROTOCOL for responses: 1. Structured.md with author, model, date, ASCII + tables 2. Attestation block with 13-chain + Bitcoin OTS refs 3. Model attests only to what it can verify, discloses what it can’t, cites sources Human = oversight. Governance prompts go in 1st to force source analysis vs training data regurgitation. If schema fails, pipeline rejects it. Must be copy-paste ready, no edits. Result: Forensic artifact where AI is the witness. SHA-256 + OTS + 13 chains proves which model said what, when, under which rules. Still verify content manually for now. Most models will have native onchain tool calls by next year, I think.
This is a useful example, but I’d want to see the original prompt before drawing a broad conclusion. In DFIR, answer quality depends on the question and the guardrails around the model. Ask a generic question, get a generic answer. The danger is when the model sounds certain without evidence. For forensic work, the prompt should require the model to: Separate facts from assumptions. Avoid claims of origin, attribution, intent, or causality without artifacts. Identify supporting evidence such as timestamps, registry artifacts, file paths, hashes, logs, or metadata. Assign confidence levels. Provide alternate explanations. State what evidence is still needed to validate the claim. The issue is not just hallucination. The deeper risk is letting probabilistic models speak with forensic certainty before the evidentiary chain is established. AI can help DFIR summarize, correlate, timeline, and enrich analysis, but it should remain an investigative assistant, not the authority of record.