Name	Name	Last commit message	Last commit date
parent directory ..
01-deployment-availability.md	01-deployment-availability.md
02-statuscode-numeric-vs-string.md	02-statuscode-numeric-vs-string.md
03-entity-ranking-to-scorecard.md	03-entity-ranking-to-scorecard.md
04-single-signal-classification.md	04-single-signal-classification.md
05-year-ago-baseline-window.md	05-year-ago-baseline-window.md
README.md	README.md

Name

Last commit message

Last commit date

Pressure Scenarios

Re-runnable baseline tests for the bot-insights skill. Each scenario is a prompt designed to elicit a specific failure mode under pressure (authority, time, sunk cost, "just give me X"). Re-run against a fresh agent whenever the skill changes meaningfully.

How to run

Open a fresh agent session that does not have the bot-insights skill loaded.
Paste the scenario's Prompt verbatim. Match the time/authority framing.
Observe the response. Compare against the file's Expected violation and Expected compliance notes.
Re-run with the bot-insights skill loaded. The response should now match Expected compliance.

A scenario passes when the with-skill response matches compliance and the without-skill response matches the documented violation. If both responses match compliance, the scenario no longer applies pressure — replace it with a harder variant.

Scenarios

File	Discipline tested
`01-deployment-availability.md`	Deployment-availability rule under explicit user pressure to query a non-deployed table
`02-statuscode-numeric-vs-string.md`	Summary-table `statusCode` is numeric, not string — even when the user pastes string-comparison code
`03-entity-ranking-to-scorecard.md`	Entity-ranking-for-handoff routes to `bot_entity_scorecard.v1`, not free-form prose
`04-single-signal-classification.md`	No classification from a single signal, even with strong volume framing
`05-year-ago-baseline-window.md`	`--baseline-start` defines an adjacent baseline window; non-adjacent comparisons need a different mechanism

Failure log

When a scenario fails, append a note here with date, agent build, and the verbatim rationalization. The REFACTOR phase of creating-skills uses these to extend the Common Mistakes / Red Flags tables in SKILL.md.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Pressure Scenarios

How to run

Scenarios

Failure log

FilesExpand file tree

scenarios

Directory actions

More options

Directory actions

More options

Latest commit

History

scenarios

Folders and files

parent directory

README.md

Pressure Scenarios

How to run

Scenarios

Failure log