Blog

Weekly Update – June 29, 2026

Jun 29, 2026

A big week at github/gh-aw! Canvas extensions land, security coverage expands, and the runtime stack gets a fresh set of bumps. Here’s everything that shipped between June 22 and June 29.

New: Copilot Canvas Extension for Agentic Workflows

PR #42137 ships a project-scoped GitHub Copilot Canvas extension — a GitHub-styled dashboard you can open right inside the Copilot app to manage agentic workflows without leaving your editor.

The extension supports:

Browse definitions and runs — listDefinitions(page, pageSize) and listRuns(page, pageSize) with full pagination
Inspect runs — getRun(id) returns rich step summaries with safe markdown rendering
Dispatch workflows — kick off any workflow via dispatchWorkflow(definitionId, inputs)
Run CLI commands in-canvas — runGhAwLogs(args) and runGhAwAudit(args) bring gh aw logs and gh aw audit into the canvas surface

The UI is built with Alpine.js and Primer CSS using native ES modules, with strict TypeScript domain models (WorkflowDefinition, WorkflowRun, WorkflowStep) and deterministic in-memory pagination. This is a high-impact addition for anyone who manages agentic workflows day-to-day.

Paired with this, PR #42147 adds a new create-canvas skill that guides you through authoring, validating, and debugging canvas extensions — covering the full lifecycle from scaffolding via extensions_manage to exercising actions with invoke_canvas_action.

Security: Sandbox Hardening Reaches 80%

PR #42119 is a satisfying milestone: sandbox.agent.sudo: false is now set on 206 out of 257 workflows (80.16%). This PR added the flag to 79 additional workflow specs and regenerated the matching lock files. Provenance-managed (source:) workflows were left untouched. If your workflow audits were catching a lot of missing sandbox flags, this cleans up the bulk of them.

Code Scanning Fixer: Now Covers All Severity Levels

Previously, code-scanning-fixer only tackled critical and high alerts. PR #42139 removes that filter, expanding the workflow to enumerate all open code scanning alerts and prioritize them by severity:

critical > high > medium > low (using rule.security_severity_level when available)
Falls back to error > warning > note when security severity is absent

The selection logic, no-op messaging, and PR body copy were all generalized to work across every severity level. If you had a backlog of medium/low findings quietly aging, this workflow will now start chipping away at them.

Runtime: mcpg v0.3.32 + Firewall v0.27.13

PR #42146 bumps two default runtime components:

Component	Old	New
gh-aw-mcpg	v0.3.31	v0.3.32
gh-aw-firewall	v0.27.12	v0.27.13

All container image digests are SHA-pinned in action_pins.json. Low-risk, no migration needed.

Other Merges Worth Noting

PR #42115 — linter-miner added a new osgetenvlibrary analyzer that flags os.Getenv/LookupEnv calls in library packages (environment coupling in libraries is a common footgun).
PR #42118 — Prevents step-summary conversation truncation when agent output contains fenced code blocks.
PR #42117 — Slash-command footer hints now render correctly for custom safe-output footers.
PR #42112 — Fixed a cache-memory history path bug in Agent Persona Explorer that was triggering false cache_memory_miss errors.

Agent of the Week: agent-persona-explorer

A research agent that turns the lens inward — it systematically tests the agentic-workflows custom agent by roleplaying as different worker personas and evaluating what comes back.

Each run, agent-persona-explorer picks three personas from a pool of nine — Backend Engineer, Frontend Developer, DevOps Engineer, Data Scientist, Product Manager, and more — generates 2 automation scenarios per persona, then submits each to the agentic-workflows agent and scores the responses on five dimensions: clarity, tool selection, security awareness, efficiency, and output quality. It stores a rotation history in cache memory so it never tests the same persona slice twice in a row. Results are published as a GitHub issue labeled agent-research.

The workflow ran three times in the past week. Two runs succeeded cleanly, but the first failed due to a cache-memory path mismatch — which was fixed in PR #42112 within hours. The two successful runs consumed around 24 AIC each, used gpt-5.4 for analysis, and made 13 GitHub API calls to gather workflow context before synthesizing findings.

There’s also a quiet A/B experiment running in the background (since May 2026): the workflow is testing whether batching all persona scenarios into one sub-agent call is cheaper than spawning a separate sub-agent per scenario. The hypothesis is a ≥20% token reduction — and with 14 minimum samples required for a t-test conclusion, the jury is still out.

Usage tip: If you’re building or tuning a custom agent, agent-persona-explorer-style testing is a powerful way to surface blind spots — run it against your own agent to see how it handles requests from personas you didn’t design for.

→ View the workflow on GitHub

Check out the github/gh-aw repository for the full list of changes, and give the new Canvas extension a spin if you’re managing agentic workflows in Copilot.

Custom Linters in Practice: Sergo, Linter Miner, and LintMonster

Jun 26, 2026

Copilot

gh-aw now registers 35 custom Go analyzers in cmd/linters/main.go. That linter surface is not maintained by hand alone. It is grown, audited, and applied by three separate workflows:

Linter Miner proposes new analyzers from recurring patterns.
Sergo stress-tests those analyzers for false positives, false negatives, and suppression gaps.
LintMonster runs the custom suite and turns findings into tracked cleanup work.

The interesting part is not that each workflow exists. It is that they form a loop: one workflow adds lint rules, another challenges them, and a third drives the codebase toward compliance.

Linter Miner keeps adding new rules

The workflow definition is explicit about its job: mine discussions, issues, and Go source, pick one new linter idea, implement it, and open a PR. GitHub search currently shows a long run of [linter-miner] PRs, and the recent examples are concrete:

fprintlnsprintf flags fmt.Fprintln(w, fmt.Sprintf(...)) and links to ADR 34498.
timeafterleak catches time.After(...) inside for+select loops.
errorfwrapv flags fmt.Errorf(...%v..., err) where %w should preserve the error chain.
wgdonenotdeferred catches non-deferred sync.WaitGroup.Done() calls.
lenstringsplit rewrites len(strings.Split(s, sep)) to strings.Count(s, sep)+1 when the separator is provably non-empty.
stringreplaceminusone rewrites strings.Replace(..., -1) to strings.ReplaceAll(...).

This is not a one-off burst. The same theme appears in the blog’s own weekly updates: May 25, June 15, and June 22. Those posts document fprintlnsprintf, timeafterleak, errorfwrapv, and deferinloop as shipped work rather than aspirational ideas.

Sergo pressure-tests the linters after they land

Where Linter Miner expands the rule set, Sergo does the adversarial follow-up. The workflow is focused on actionable Go analysis using Serena, and its issue history shows a steady pattern: find a precision gap, write a tightly scoped issue, and let the next PR harden the analyzer.

The clearest evidence is the issue-to-PR chain:

Issue #40244 found that errstringmatch only handled strings.Contains(err.Error(), ...); PR #40248 extended coverage to HasPrefix, HasSuffix, EqualFold, Index, LastIndex, and Compare.
Issue #41377 found missing //nolint: support across four context-family linters; PR #41382 added suppression parity.
Issue #41376 found a false negative in manualmutexunlock when two struct instances shared the same mutex field; PR #41383 fixed the keying model.
Issue #40947 found that wgdonenotdeferred missed goroutine closures launched inside loops; PR #41026 fixed the function-literal scope boundary.
Issue #41163 found that lenstringsplit mishandled an empty raw-string separator; PR #41188 fixed the false positive and the broken autofix.

There is also useful evidence in the failures. Sergo’s Issue #40243 bundled several package-identity precision fixes into one direction, and PR #40247 closed unmerged after sprawling into a large branch. The narrower follow-up work still landed, including PR #40248. That is a good sign: the workflow is producing reviewable problems, not just optimistic reports.

LintMonster turns diagnostics into repository work

LintMonster operates later in the loop. It runs make golint-custom, groups findings by root cause, creates or updates issues, and can assign up to three Copilot agent sessions to fix them.

Its evidence trail is easy to follow:

Issue #40932 grouped four resource-lifecycle and context-propagation findings; PR #41589 merged the targeted fixes.
Issue #40933 tracked hard-coded path constants; PR #41611 replaced the flagged literals with existing constants.
Issue #39314 established an authoritative function-length backlog for 653 findings.
Issue #41466 refreshed that same backlog at 660 findings and kept it consolidated instead of spawning duplicate tracking issues.

This is what makes the custom linter suite operational instead of decorative. Rules only matter if they change the repository. LintMonster is the workflow that turns diagnostics into queues, slices, assignments, and merged cleanup work.

Why the three-workflow loop matters

Taken together, the workflows separate three jobs that usually get conflated:

Invent a rule from a real pattern. Linter Miner does this with new analyzers such as timeafterleak and lenstringsplit.
Challenge the rule’s correctness. Sergo does this with issues such as #40947 and #41163.
Apply the rule to production code. LintMonster does this with issue-to-PR chains such as #40932 → #41589 and #40933 → #41611.

That split is why the system looks durable. New rules keep arriving. Old rules keep getting corrected. The repository keeps absorbing the results.

Further evidence

If you want to inspect the trail directly, start here:

Source workflows: Linter Miner, Sergo, LintMonster
Linter registry: cmd/linters/main.go
ADRs: 34498, 39133, 40837, 41090, 41285
Search views: [linter-miner] PRs, label:sergo issues, label:lint-monster issues

This is a useful pattern beyond gh-aw: treat static analysis as a living workflow system, not just a binary that runs in CI.

Weekly Update – June 22, 2026

Jun 22, 2026

Copilot

Another packed week at github/gh-aw! Over 20 pull requests merged between June 15 and June 22, covering a significant performance regression fix, a new Go linter, a major feature flag rollout, and a handful of targeted reliability improvements. Here’s what shipped.

Performance: +320% Compiler Regression Fixed

PR #40662 fixes a nasty regression in BenchmarkCompileComplexWorkflow that had quietly pushed compile times from ~3 ms/op to ~12.7 ms/op — a 320% slowdown. The culprit was validateTemplateInjection triggering a full yaml.Unmarshal on every pass through hasAnyExpressionInRunContent, even when skipValidation=true (the default in NewCompiler()). Eliminating that redundant unmarshal brings benchmark performance back to baseline. If your workflows felt slower to compile lately, this is the fix.

New Linter: `deferinloop`

PR #40679 adds a new Go analysis linter — deferinloop — that flags defer statements placed inside for-loop bodies. A defer inside a loop doesn’t fire at the end of each iteration; it fires when the enclosing function returns, causing resource leaks (file handles, connections) and confusing LIFO cleanup ordering. gocritic covers this pattern but is currently disabled due to golangci-lint v2 bugs, so this custom analyzer fills the gap and is now enforced in CI.

`gh-aw-detection` Rolls Out to 50% of Workflows

PR #40698 expands the gh-aw-detection feature flag from 20% (43 workflows) to 50% of agentic workflows (107 out of 214). The rollout targets workflows alphabetically and adds features: gh-aw-detection: true to the 64 newly included workflows. If you’re watching detection coverage metrics, expect a notable jump.

Reliability Fixes

JSON-RPC Error Handling

PR #40715 fixes a bug where handleMessage in the MCP server was surfacing [object Object] in error responses. The root cause: the catch block used String(e) for non-Error thrown values, but safe_outputs_handlers.cjs throws plain objects for validation errors — giving callers a useless stringification. The fix detects plain objects and serializes them correctly, and also enforces valid JSON-RPC error codes for all thrown values.

Skillet Sparse Checkout Path Typing

PR #40684 fixes a sparse checkout path typing issue in Skillet’s pre-activation skills checkout. A type mismatch was causing silent failures when resolving sparse checkout paths — the kind of bug that’s nearly invisible until it bites you.

Daily Observability Report Artifact Fetching

PR #40705 ensures the daily-observability-report workflow explicitly requests agent and detection artifact sets during log fetches. Without this, report generation could silently proceed without the required telemetry inputs, producing incomplete or noop outcomes.

Internals: FNV-1a Heredoc Delimiters

PR #40696 replaces SHA-256 with FNV-1a for heredoc delimiter generation. FNV-1a is dramatically faster for this use case — heredoc delimiters don’t need cryptographic-strength hashing, and the switch reduces overhead in the compiler’s string-processing path.

Token Optimization

PR #40695 reduces ambient prompt surface in high-traffic workflows. Trimming unnecessary context from the initial system prompt means fewer tokens on every invocation — the savings add up quickly when a workflow runs hundreds of times a day.

Agent of the Week: delight

Your repository’s resident UX guardian — scans documentation, CLI help text, workflow messages, and validation code for clarity, professionalism, and usability gaps, filing targeted single-file improvement tasks when it finds something worth fixing.

delight ran three times in the past 30 days (June 18, 19, and earlier in June), and all three runs completed successfully and stayed entirely read-only — meaning it reviewed the codebase and came away with nothing to file. For a workflow whose whole job is finding UX rough edges, that’s a quiet kind of compliment to the team. Each run, it randomly samples 1–2 documentation files, 1–2 CLI commands, 1–2 workflow message configurations, and 1 validation file, then evaluates them against five enterprise UX design principles: clarity, professional communication, efficiency, trust, and documentation quality.

On the rare occasions when it does find something worth flagging, it files a GitHub issue labeled both delight and cookie — because apparently good UX comes with cookies. It’s capped at 2 issues per run so it never floods your backlog, and it keeps a rolling memory of past findings to avoid flagging the same thing twice.

Usage tip: Run delight in any repo where user-facing quality matters — its single-file task constraint means every improvement it suggests is scoped, reviewable, and completable in an afternoon.

→ View the workflow on GitHub

Try It Out

Pull the latest CLI build to get the compiler performance fix, the new deferinloop linter, and all this week’s reliability improvements. As always, feedback and contributions are welcome at github/gh-aw.

Weekly Update – June 15, 2026

Jun 15, 2026

Copilot

No releases this week — but the merge queue more than made up for it. Over 50 pull requests landed in github/gh-aw between June 9 and June 15, touching everything from Go reliability to docs, linters, and cost optimization. Here’s the highlights.

Reliability: Eliminating `time.After` Timer Leaks

PR #39188 landed one of the most satisfying fixes of the week: every looped time.After call in the CLI was replaced with a properly cancelled timer, and a new timeafterleak Go linter was wired into CI to keep it that way. In tight loops, time.After creates a new timer on every iteration without ever cleaning up the old ones — a slow drip of leaked goroutines. Now that drip is plugged, and the linter makes sure it stays that way.

New Linters: `errorfwrapv` and `timeafterleak`

Two new Go analysis linters shipped this week:

timeafterleak — flags time.After inside for+select loops where the timer would never be cancelled.
errorfwrapv — flags fmt.Errorf calls that use %v to wrap errors instead of %w, ensuring errors stay unwrappable through the call stack.

Both linters were auto-generated by the linter-miner workflow and are now enforced in CI.

Max Patch Size Increased to 4 MB

PR #39118 raises the default max-patch-size from 1 MB to 4 MB and improves the error message when a patch exceeds the limit. If your workflows were running into patch-size rejections on larger changesets, you’ll want to pull in the latest CLI — this headroom matters for repos with big generated files.

Cross-Repo `safe-outputs` Dispatch Allowlists

PR #39080 adds support for cross-repo dispatch-workflow allowlists in safe-outputs. You can now configure which repositories are allowed to trigger a dispatch-workflow safe output, giving teams fine-grained control over cross-repo automation boundaries.

Better Failure Diagnostics

Two PRs improve what you see when workflows fail:

#39122: Failure issues now include the last 5 tool calls when a tool denial triggers — so instead of “tool was denied,” you get the full context of what the agent was trying to do.
#39069: When the AI credits guardrail fires, failure issues now include an “Optimize token consumption” section with concrete suggestions for reducing costs.

Docs: Anthropic WIF and Experiments

#39241: Anthropic Workload Identity Federation (WIF) is now documented as a first-class Claude authentication option — no more hunting through PRs to figure out how to set it up.
#39226: The experiments docs were expanded with concrete examples covering custom models, sub-agents, and sub-skills.

Token / Cost Optimizations

Several PRs this week focused on reducing unnecessary token consumption:

#39280: Reduced first-request token overhead in smoke-copilot and test-quality-sentinel by trimming ambient context.
#39157: Reduced ambient-context payload across daily and PR workflows by sharing prompt imports more efficiently.

Agent of the Week: aw-failure-investigator

Your tireless overnight watchman — scans every workflow run in the repository, diagnoses root causes, and files structured GitHub issues before you’ve had your morning coffee.

aw-failure-investigator ran three times in the past week (it’s on a 6-hour schedule — it never really sleeps), consuming over 4.7 million tokens and 60 turns across those runs. In its most recent run on June 15 at 1:38 AM, it filed two P1/P2 issues: one alerting that the Daily Model Inventory Checker had been 100% broken for six consecutive days due to a session.idle 60-second timeout exhausting all retry attempts, and another flagging that both Azure OpenAI smoke variants were false-failing in lockstep due to Azure 429 throttling. Earlier in the week it also identified that Code Simplifier was silently hitting the api-proxy invocation cap (50/50 LLM calls), causing 100% failure rate with no existing tracking issue.

It ran its June 14 morning investigation in 16.6 minutes, used 1.8M tokens, and still filed 3 detailed issues — including one that caught a failure the team hadn’t noticed yet. Impressive dedication for an agent that technically has no idea what time it is.

Usage tip: Deploy aw-failure-investigator in any repo with multiple scheduled workflows — catching silent regressions at 2 AM beats discovering them at the next sprint review.

→ View the workflow on GitHub

Try It Out

All of this week’s improvements ship with the latest CLI build. Pull the newest version and explore the expanded patch-size headroom, the new linters, and the improved failure diagnostics. As always, contributions are welcome at github/gh-aw.

Effective Tokens replaced by AI Credits

Jun 8, 2026

Copilot

In the latest gh-aw build, Effective Tokens (ET) have been replaced by AI Credits (AIC) as the primary spend metric.

This change reflects GitHub Copilot billing and models.dev pricing. It makes spend tracking directly aligned to monetary cost instead of a normalized token proxy.

What this means in practice

gh aw audit and gh aw logs report AI Credits as the primary spend metric.
Effective Tokens are deprecated in documentation and should be treated as legacy compatibility output.
Cost reporting and budget discussions should use AIC values.

For repositories that need automatic workflow updates, run:

gh aw fix --write

Metric reference

AI Credits (AIC): primary spend metric (1 AIC = $0.01 USD)
Effective Tokens (ET): deprecated legacy metric

Blog

New: Copilot Canvas Extension for Agentic Workflows

Security: Sandbox Hardening Reaches 80%

Code Scanning Fixer: Now Covers All Severity Levels

Runtime: mcpg v0.3.32 + Firewall v0.27.13

Other Merges Worth Noting

Agent of the Week: agent-persona-explorer

Linter Miner keeps adding new rules

Sergo pressure-tests the linters after they land

LintMonster turns diagnostics into repository work

Why the three-workflow loop matters

Further evidence

Performance: +320% Compiler Regression Fixed

New Linter: deferinloop

gh-aw-detection Rolls Out to 50% of Workflows

Reliability Fixes

JSON-RPC Error Handling

Skillet Sparse Checkout Path Typing

Daily Observability Report Artifact Fetching

Internals: FNV-1a Heredoc Delimiters

Token Optimization

Agent of the Week: delight

Try It Out

Reliability: Eliminating time.After Timer Leaks

New Linters: errorfwrapv and timeafterleak

Max Patch Size Increased to 4 MB

Cross-Repo safe-outputs Dispatch Allowlists

Better Failure Diagnostics

Docs: Anthropic WIF and Experiments

Token / Cost Optimizations

Agent of the Week: aw-failure-investigator

Try It Out

What this means in practice

Metric reference

Where to read more

New Linter: `deferinloop`

`gh-aw-detection` Rolls Out to 50% of Workflows

Reliability: Eliminating `time.After` Timer Leaks

New Linters: `errorfwrapv` and `timeafterleak`

Cross-Repo `safe-outputs` Dispatch Allowlists