title	Compression Engines
version	3.8.2
lastUpdated	2026-06-17

Compression Engines

OmniRoute compression is built around engine contracts. A mode can run one engine directly (caveman or rtk) or a deterministic stacked pipeline that executes multiple engines in order.

Modes

Mode	Engine path	Intended input
`off`	none	Exact prompt preservation
`lite`	Caveman lite helpers	Low-risk always-on cleanup
`standard`	Caveman	Natural-language prompt condensation
`aggressive`	Caveman + history/tool summarizers	Long chat sessions
`ultra`	Caveman + pruning helpers	Context-limit recovery
`rtk`	RTK	Terminal, shell, build, test, and git output
`stacked`	Pipeline, default `rtk -> caveman`	Mixed tool logs and prose, max savings

Engine Registry

The registry lives in open-sse/services/compression/engines/registry.ts. Engines expose a shared contract:

id: stable engine id such as caveman or rtk
apply(text, config): legacy execution path used by stacked pipelines
compress(input, config): primary execution path returning text + stats
getConfigSchema(): returns the JSON-Schema-like shape of valid config
validateConfig(config): returns { valid, errors[] }

Registration uses registerCompressionEngine(engine) (or registerEngine for advanced cases), which calls assertValidEngine() and validateConfig(defaultConfig) before accepting. Use unregisterCompressionEngine(id) to remove an engine at runtime.

strategySelector.ts registers the built-in engines before compression runs. This lets preview, runtime compression, stacked mode, tests, and future engines use the same execution path.

MCP description compression (related)

A separate registry compresses MCP tool description metadata at registry-level — see open-sse/mcp-server/descriptionCompressor.ts and MCP-SERVER.md. It reuses Caveman rules but operates on tool metadata, not request payloads.

Caveman

Caveman mode focuses on semantic condensation of normal prose:

preserves code blocks, URLs, JSON, paths, and structured data
removes filler, hedging, repeated context, and verbose connective phrasing
supports language-aware file rule packs in open-sse/services/compression/rules/
remains available through the legacy standard, aggressive, and ultra modes

The dashboard surface is Dashboard -> Context & Cache -> Caveman.

Caveman upstream reports ~75% fewer output tokens, 65% average output savings in benchmarks with a 22-87% range, and a ~46% input-compression tool. OmniRoute uses the Caveman input-side number when documenting stacked prompt/context savings; Caveman output mode remains a separate response-behavior feature.

RTK

RTK mode focuses on command and tool output:

detects output classes such as git status, git branch, git diff, Vitest/Jest/Pytest, Cargo/Go tests, TypeScript/Vite/Webpack builds, ESLint, npm audit/installs, Docker logs, shell find/grep, stack traces, and generic logs
applies 49 JSON filters from open-sse/services/compression/engines/rtk/filters/
supports the RTK-style declarative pipeline: ANSI stripping, replace, match-output short-circuit, strip/keep lines, per-line truncation, head/tail/max-line truncation, and on-empty fallback
supports trust-gated project filters in .rtk/filters.json and global filters in DATA_DIR/rtk/filters.json
strips ANSI sequences, progress noise, repeated lines, and unhelpful boilerplate
preserves actionable failures, warnings, summaries, changed files, and tail context
can optionally retain redacted raw output for recovery/debugging through authenticated management routes

The dashboard surface is Dashboard -> Context & Cache -> RTK.

Operational details for custom filters, trust, verify, and raw-output recovery live in RTK_COMPRESSION.md.

RTK upstream reports 60-90% savings for command-output compression. Its README example shows a 30-minute Claude Code session going from ~118,000 tokens to ~23,900, or 79.7% saved.

LLMLingua-2 (Semantic Pruning)

LLMLingua-2 mode performs semantic token pruning on prose using a small ONNX token classifier, complementing the rule-based Caveman and RTK engines:

compresses prose in non-system messages only; fenced code blocks and other preserved constructs are never altered
runs the @atjsh/llmlingua-2 backend (ONNX via @huggingface/transformers) in a worker thread, so model inference never blocks the request event loop
is stackable (stackPriority 35): in a stacked pipeline it runs after the structural engines (CCR, session-dedup, headroom, Caveman) but before ultra, since semantic pruning is most effective on already-structurally-compressed text — e.g. rtk -> caveman -> llmlingua
fail-opens on any error (missing optional deps, worker spawn, model load, inference, or timeout) → the original text is returned unchanged, never an error

Engine location: open-sse/services/compression/engines/llmlingua/. The dashboard surface is Dashboard -> Context & Cache -> LLMLingua.

Models

The default model is TinyBERT (atjsh/llmlingua-2-js-tinybert-meetingbank, ~57 MB, fast). A higher-accuracy BERT-base model (Arcoldd/llmlingua4j-bert-base-onnx, ~710 MB) is available via the engine config model field. @huggingface/transformers downloads the selected model lazily from the HuggingFace Hub into ${DATA_DIR}/models/llmlingua on the first call (modelStore.ts); a modelPath config override points it at a local copy instead (offline / air-gapped installs).

Optional dependencies & on-demand install

The LLMLingua runtime stack is optional. Three packages are declared as optionalDependencies in package.json and kept external by the production build (scripts/build/prepublish.ts does not bundle them):

Package	Version (pin)	Notes
`@atjsh/llmlingua-2`	`2.0.3`	Entry package; declares the others as peers
`@tensorflow/tfjs`	`4.22.0`	Heaviest dep — dominates the ~800 MB footprint
`js-tiktoken`	`^1.0.20`	Tokenizer

@huggingface/transformers is pinned at 3.5.2 as a regular dependency (shared with the local embeddings path), so it always ships — only the three packages above are prunable. A standard npm install (dev) installs them automatically.

Why on-demand: the npm-published package, the standalone bundle, and the Docker image ship without these deps to stay slim. When they are absent, the worker's dependency gate (a @atjsh/llmlingua-2 resolve probe in worker.ts) fails and the engine fail-opens silently — selecting LLMLingua becomes a no-op (text returned unchanged, no error logged). To activate it in a pruned environment, install the optional stack:

# pin to the versions declared in package.json optionalDependencies
npm install @atjsh/llmlingua-2@2.0.3 @tensorflow/tfjs@4.22.0 js-tiktoken

Roughly ~800 MB total: the TensorFlow.js + transformers runtimes dominate; the TinyBERT model adds ~57 MB downloaded at first use (not via npm).

Per environment:

Dev / npm install — installed automatically unless you passed --omit=optional (or --no-optional). No action needed.
Global npm (npm i -g omniroute) / standalone — run the install command above inside the installed package directory, or reinstall without omitting optional deps.
Docker — add the install command in a derived image layer; the published image ships slim by design.
VPS (PM2) — install into the app's node_modules, then restart the process so the worker re-probes the gate.

Verify it is active: with LLMLingua selected, real prose actually shrinks (the engine stops fail-opening), and the first request triggers the model download into ${DATA_DIR}/models/llmlingua. The gate intentionally probes only @atjsh/llmlingua-2 — the other peers are ESM-only and require.resolve throws on them even when present — so the worker still fail-opens if any peer is genuinely missing at import() time.

Stacked Pipelines

Stacked mode runs pipeline steps in order. The default is:

rtk -> caveman

Use this for coding-agent sessions where a prompt combines command output with human or assistant prose. RTK reduces noisy tool logs first, then Caveman compresses remaining natural language.

Pipeline steps are configured with stackedPipeline in compression settings or through compression combos.

When both engines reduce the same eligible payload, savings compound:

combined = 1 - (1 - RTK savings) * (1 - Caveman input savings)
average  = 1 - (1 - 0.80) * (1 - 0.46) = 89.2%
range    = 1 - (1 - 0.60..0.90) * (1 - 0.46) = 78.4-94.6%

MCP Accessibility Tree Filter

The MCP accessibility-tree smart filter is a post-execution compression layer that runs on MCP tool results, not on prompts or context. It targets the verbose accessibility-tree and browser snapshot payloads returned by tools like Playwright, computer-use, and browser-automation MCP servers.

What it does

Noise stripping — removes empty generic/text entries (- generic:, - text: "")
Sibling collapse — when ≥ collapseThreshold (default 30) consecutive lines are structural repeats, collapses them into the first collapseKeepHead (default 10) lines + a count summary + the last collapseKeepTail (default 5) lines
Ref preservation — [ref=eXX] anchors required by Playwright/computer-use are never touched
Hard truncation — if the text after collapse still exceeds maxTextChars (default 50,000), truncates with a navigation hint so the agent can continue working

Engine location

open-sse/services/compression/engines/mcpAccessibility/
  index.ts            ← smartFilterText() entry point
  collapseRepeated.ts ← sibling-collapse algorithm
  constants.ts        ← DEFAULT_MCP_ACCESSIBILITY_CONFIG

Configuration

Controlled by compression.mcpAccessibility in global settings (migration 056). Default config:

{
  "enabled": true,
  "maxTextChars": 50000,
  "collapseThreshold": 30,
  "collapseKeepHead": 10,
  "collapseKeepTail": 5,
  "minLengthToProcess": 2000
}

The filter is only applied to tool-result payloads whose type is "text" and whose length exceeds minLengthToProcess. It does not affect prompt compression or request payloads.

Expected savings

60–80% on browser snapshot tool results, depending on page complexity. The collapse algorithm is O(n) in line count and adds negligible latency.

This filter vs the compression engines above

Aspect	Caveman / RTK / Stacked	MCP accessibility filter
Target	Request prompts / context	MCP tool results
Trigger	Compression mode setting	`compression.mcpAccessibility.enabled`
Scope	All SSE messages	Tool results only
Ref anchors	N/A	Preserved unconditionally

Compression Combos

Compression combos are named compression profiles that can be assigned to routing combos:

compression_combos: stores mode, pipeline, RTK config, language config, and default marker
compression_combo_assignments: maps a compression combo to a routing combo
runtime integration resolves an assigned compression combo before generic combo overrides
analytics include compression_combo_id and engine

Dashboard surface: Dashboard -> Context & Cache -> Compression Combos.

API Surface

Route	Purpose
`/api/settings/compression`	Global compression settings (includes `mcpAccessibility` config)
`/api/compression/preview`	Preview any compression mode
`/api/compression/language-packs`	List available Caveman language packs
`/api/context/caveman/config`	Caveman settings alias
`/api/context/rtk/config`	RTK defaults and settings
`/api/context/rtk/filters`	RTK filter catalog
`/api/context/rtk/test`	RTK preview/test endpoint
`/api/context/rtk/raw-output/[id]`	Authenticated redacted raw-output recovery
`/api/context/combos`	Compression combo CRUD
`/api/context/combos/[id]/assignments`	Routing-combo assignment CRUD
`/api/context/analytics`	Compression analytics alias

Management routes require management authentication or API-key policy checks.

MCP Tools

Compression exposes five MCP tools:

Tool	Scope	Purpose
`omniroute_compression_status`	`read:compression`	Settings, analytics, cache stats
`omniroute_compression_configure`	`write:compression`	Update global settings
`omniroute_set_compression_engine`	`write:compression`	Set mode and optional pipeline
`omniroute_list_compression_combos`	`read:compression`	List compression combos
`omniroute_compression_combo_stats`	`read:compression`	Read combo/engine analytics

Known limitations

LLMLingua-2 (SLM) requires co-located optional deps. The worker only runs in a production build when @atjsh/llmlingua-2 + peers are co-located into dist/node_modules (see scripts/build/colocateOptionals.mjs, #4286). Without them the engine fail-opens (returns the original text). Worker resolution no longer depends on import.meta.url (it dies in the standalone bundle) — it anchors on the runtime cwd / argv[1].
Caveman language packs de / fr / ja are partial. They ship context + filler + structural rules but no dedup / ultra packs, so ultra intensity is no stronger than full for those languages (they use only their own rules — there is no silent fall-back to the English dedup/ultra rules, which would mangle foreign text). en / es / id / pt-BR are complete. Contributions of dedup.json + ultra.json for the partial packs are welcome.
Stacked telemetry only lists engines that compressed. A stacked-pipeline step whose engine ran but produced 0 % savings returns stats:null and so does not appear in engineBreakdown — indistinguishable from a step that was skipped. Distinguishing "ran, 0 %" from "skipped" would require a breakdown-model change and is deferred.

Validation

The focused gates for this area are:

node --import tsx/esm --test tests/unit/compression/rtk-*.test.ts tests/unit/compression/pipeline-integration.test.ts tests/unit/compression/context-compression-api.test.ts
node --import tsx/esm --test tests/unit/compression/*.test.ts tests/golden-set/*.test.ts tests/integration/compression-pipeline.test.ts tests/unit/api/compression/compression-api.test.ts
node --import tsx/esm --test tests/unit/compression/mcpAccessibility*.test.ts
npm run typecheck:core

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compression Engines

Modes

Engine Registry

MCP description compression (related)

Caveman

RTK

LLMLingua-2 (Semantic Pruning)

Models

Optional dependencies & on-demand install

Stacked Pipelines

MCP Accessibility Tree Filter

What it does

Engine location

Configuration

Expected savings

This filter vs the compression engines above

Compression Combos

API Surface

MCP Tools

Known limitations

Validation

FilesExpand file tree

COMPRESSION_ENGINES.md

Latest commit

History

COMPRESSION_ENGINES.md

File metadata and controls

Compression Engines

Modes

Engine Registry

MCP description compression (related)

Caveman

RTK

LLMLingua-2 (Semantic Pruning)

Models

Optional dependencies & on-demand install

Stacked Pipelines

MCP Accessibility Tree Filter

What it does

Engine location

Configuration

Expected savings

This filter vs the compression engines above

Compression Combos

API Surface

MCP Tools

Known limitations

Validation