Skip to content

Nubi305/plainmarker

Repository files navigation

plainmarker

plainmarker

▶ Live demo & how it works  ·  PyPI  ·  uvx plainmarker check .

PyPI Python versions License: MIT

A fail-closed second opinion on the code your AI agent just wrote — deterministic, in-loop checks that need no model and no account and never send your code off your machine.

plainmarker is an in-loop verifier for developers running AI coding agents (Claude Code, Cursor, Codex). The moment the agent edits a file, plainmarker reads what the code, tests, and scans actually do, not what the agent claims, and tells you in plain language what's dangerous right now: a hardcoded secret, a curl | sh install hook. These are deterministic checks — detect-secrets, a vendored offline Semgrep ruleset, and a shell-command auditor — not the writer agent's say-so, so they can't just agree with it, and they run entirely on your machine. (plainmarker check additionally screens your dependencies against the OSV advisory database, sending package names — never your code. plainmarker's deeper, opt-in plainmarker audit adds an independent reviewer on a different AI model family and discloses what it sends; the in-loop seatbelt runs no model.) It never calls your code "safe" — only which checks passed, and that this is not a guarantee — and when a check can't finish, it says so (fail-closed) instead of pretending all-clear. It also remembers why each decision was made and re-checks it every session, so trust compounds instead of resetting.

You don't have to read the diff to know what your agent just did to your project.

Status: in-loop wedge, v0.49.0. The in-loop plainmarker seatbelt + an actionable verdict + a one-line install are the current focus. Running the target's tests in a sandbox and the agent-facing API are deferred to later milestones.

See it catch a danger — in the loop, the moment it's written

plainmarker catching a hardcoded key and a curl|sh in the loop

Your agent writes a hardcoded key, then a curl | sh. plainmarker reads the change (not the agent's claim) and tells you:

# (!) plainmarker found dangers in code your agent just wrote
  2 open danger type(s) — secrets + code-vulnerabilities + risky shell only; NOT a guarantee of safety.

## What looks dangerous now
- 2 potential secret(s) found in the code (e.g. config.py:1).
- Dangerous shell command(s) found: 1 (download-and-run or secret-to-network exfiltration).

## Paste this to your coding agent to fix it
> Remove any hardcoded secret and load it from an environment variable, then rotate the exposed key.
> Replace any curl|sh with a pinned, checksum-verified download.

It exits non-zero so your Git hook or CI can gate on it — and it never calls your code "safe."

The one rule

Ground truth only. plainmarker never reports what the AI says it did. It reports what the code, the tests, the scans, and the live system actually are. It never claims a project is "safe," only which specific checks passed, and that this is not a guarantee.

Privacy

Local-capable, and honest about egress. The independent Auditor uses a cloud model by default, so whenever it runs — and when onboarding writes its plain-English summary — plainmarker tells you, in plain language, that your code or project facts left your machine. Turn on local_only with a local Auditor model and nothing leaves your computer. No telemetry, ever.

Choosing the Auditor model

Three ready-to-use Auditor configs ship under keeper_core/templates/keeper/ — point at one with plainmarker audit <path> --config <file>:

  • Default — NVIDIA-served DeepSeek (no config file needed): a capable cloud model; set NVIDIA_API_KEY.
  • Free — OpenRouter (config.openrouter-free.yaml): run the Auditor at no cost on a free, different-family model; set OPENROUTER_API_KEY. Egress is disclosed like any cloud model. ⚠️ Free models may log/train on inputs, so if you are not 100% sure your project is free of private/customer data, use the local option below instead. They are also rate-limited: a throttle shows up honestly as "could not determine" (never a fake pass), so a first run that says "the reviewer did not run" is usually a rate-limit — run plainmarker doctor for the exact reason, then retry / switch model / go local.
  • Local / private — Ollama (config.local-ollama.yaml): zero egress, unlimited; local_only: true refuses any remote endpoint. Best for private code, and the only option with a benchmarked model (qwen2.5-coder:7b).

Architecture (Core + adapters)

  • keeper_core/ — the standalone engine. All real logic. Knows nothing about Claude Code and runs on its own.
  • adapters/ — thin wrappers that let a host use Core. The Claude Code plugin is the first; a standalone CLI/service and others follow. Adapters contain no logic of their own.

See docs/ARCHITECTURE.md for the full plain-language map of the parts.

Install

uvx plainmarker check .

That's the whole install: uvx fetches a compatible Python and runs plainmarker — no clone, no global setup. For a command that stays around, uv tool install plainmarker (or pipx install plainmarker), then just plainmarker <command>.

For the in-loop catch — run plainmarker seatbelt . right after an agent edits and it tells you what's dangerous now plus a paste-ready fix to hand back to the agent. Firing it automatically the moment your agent writes a file means adding a Claude Code PostToolUse hook that calls plainmarker seatbelt yourself — that wiring is manual today (a packaged hook is on the roadmap), and plainmarker never edits your ~/.claude config for you. That hook only fires in repos you opt into the watch with mkdir -p .keeper && touch .keeper/.watch.

For the opt-in plainmarker audit (the independent, different-family model reviewer) add one API key for the Auditor model — read from an environment variable, never stored in the repo (see Choosing the Auditor model above).

What you can do today

Point plainmarker at any project (each writes its receipts under that project's .keeper/ folder):

Command What it does
plainmarker doctor check your setup and that the two models are reachable
plainmarker onboard <path> map a project (files, stack, dependencies, risk areas) + a plain summary
plainmarker seatbelt <path> in-loop: after your agent edits a file, what's dangerous NOW + a paste-ready fix to hand back to the agent
plainmarker check <path> what changed since plainmarker last looked, what's dangerous now, and what it did NOT check
plainmarker baseline <path> read-only hard checks: exposed secrets, vulnerable dependencies, code-vulnerability patterns
plainmarker audit <path> the independent different-family reviewer over ground truth (facts vs concerns)
plainmarker interrogate <path> a few plain questions; flags where the code disagrees with your intent
plainmarker report <path> one plain-language report: what's true, broken, unsure, and to decide
plainmarker calibrate prove it catches planted flaws and passes clean projects (and see its blind spots)

plainmarker never claims a project is "safe" — only which specific checks passed, and that this is not a guarantee.

Exit codes (for CI, pre-commit, and agent hooks)

A script can gate on plainmarker's exit code. There are two shapes, both fail-closed in the way that matters: a real danger never exits 0.

Code Meaning
0 Nothing to act on (see per-verb note below).
1 STOP — something needs a human.
2 Usage error (bad arguments; standard argparse). Not a security signal.
  • plainmarker seatbelt and plainmarker baseline are fail-closed gates: exit 0 only when nothing dangerous was found and every check actually ran. A danger, or a check that could not finish (timeout / offline / missing tool), or a crash → 1. There is deliberately no exit code meaning "could not determine" — doubt collapses to 1, never 0, so a hook can treat any non-zero as "don't trust this yet." (baseline is strict: with no network for the dependency check it returns 1.)
  • plainmarker check is the change narrator: it exits 1 only when a real danger is present right now, and 0 otherwise. A gracefully-undetermined (unknown) check — a scanner that timed out or is offline — is disclosed in the report text and in --json as fail_closed / could_not_check, not in the exit code (a check that outright crashes still exits 1). If you want fail-closed-on-unknown gating from check, read --json fail_closed, not the exit code (or gate with seatbelt).
  • plainmarker audit additionally needs the Auditor model reachable.

License

MIT.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages