ai-tool-radar · lab recommendation review

Headroom — should we add it?

chopratejas/headroom · context-compression proxy for LLM agents · assessed 2026-06-21

Verdict · Conditional yes

Worth adopting — via the CLI/proxy path, with telemetry switched off.

The lab's top-rated find (8/10, top pick across all 3 audits) holds up under a source-level check. It compresses the bloat that actually fills your context — big Bash logs, tool dumps — for real, benchmarked token savings, works with Claude Code with zero code, and never transmits your prompt content. The old telemetry knock is gone: as of v0.27.0 telemetry is opt-in (off by default). Smoke-tested 2026-06-21 — installs clean, ~25% saved on a JSON sample, precision guardrails confirmed.

8/10

Lab rating · top pick ×3

47–92%

Token reduction (observed)

Apache-2.0

License

~45k ★

GitHub stars

0.x beta

v0.27.0 · ships ~every 4 days

Opt-in

Telemetry (v0.27.0+)

1 · What the lab suggested

Headroom is the single most consistent recommendation the audit has ever produced. It was rated 8/10 — TOP PICK on its first assessment (Jun 16) and has held that spot across all three audit runs.

The lab's reasoning: token/context efficiency is the highest-leverage axis for how you work — you ship through AI agents at high autonomy, so anything that cuts tokens-per-task beats a net-new capability. Its security pass was clean (trivy: 0 CVEs, 0 secrets; only a benign root-user note in test-fixture Dockerfiles).

One honest gap in the lab's verdict

The lab rated Headroom before it adopted its "dig into the source for phone-home behavior" rule (that discipline was learned a few days later, on a different tool). So the most important question for a tool that sits in the middle of your traffic — does it send anything out? — was never actually checked by the audit. This report checked it directly against the source. (Answer in §5.)

2 · What it actually is

A local HTTP proxy (Python core with a bundled Rust engine) that sits between your agent and the model. Before your tool outputs, logs, files, and conversation history reach the LLM, it compresses them — then lets the model pull the originals back on demand if it needs them.

The one path that matters for you

The docs list ~8 ways to install/integrate (proxy, Python SDK, TypeScript SDK, LangChain, Vercel middleware, MCP server, Docker…). Ignore the menu. For Claude Code there is exactly one:

# install once, isolated (keeps heavy ML deps out of your global env)
pipx install --python python3.13 "headroom-ai[proxy]"

# turn off default telemetry, then wrap Claude Code through the proxy
export HEADROOM_TELEMETRY=off
headroom wrap claude

headroom wrap claude starts the local proxy and points Claude Code at it via ANTHROPIC_BASE_URL — no code, no MCP. headroom unwrap claude reverses it cleanly.

The other integration paths (and who they're for)

Path	For	You?
`headroom wrap claude`	Running an agent CLI (Claude Code, Codex)	Yes
`headroom proxy`	Generic OpenAI-compatible client, any language	wrap does this
Python / TS SDK, `withHeadroom()`	Code you write that calls the API	Later
LangChain / Vercel middleware	Apps in those frameworks	n/a
MCP server	MCP clients	No — MCP
Docker image	Containerized deploy	no

The SDK paths are a separate, later option: if you ever want to cut tokens inside your own apps — WarRoom Discord bots, a Supabase pipeline that calls Claude — that's where withHeadroom(new Anthropic()) would live. Different decision from "use it with Claude Code."

3 · Plausibility — High

Fits your CLI-over-MCP rule. The wrap/proxy path is a genuine CLI — you never touch the MCP server.
Installs cleanly on your machine. pipx with a pinned interpreter is the documented isolated path; prebuilt wheels mean the Rust engine is bundled — nothing separate to manage.
Most valuable exactly where you live. You run Opus 4.8 on a 1M-token context. More context = more tokens you're paying for and more cache churn — which is precisely what Headroom trims.
Reversible. unwrap removes only Headroom's keys; Codex configs even get a byte-for-byte backup.

4 · Benefits

Workload	Before → after	Saved
Code search (100 results)	`17,765 → 1,408`	92%
Incident / log debugging	`65,694 → 5,118`	92%
GitHub issue triage	`54,174 → 14,761`	73%
Codebase exploration	`78,502 → 41,254`	47%

Direct cost cut on the real bloat sources — Bash/test logs, JSON dumps, big tool outputs.
Accuracy preserved in the maintainers' benchmarks (GSM8K identical, Δ±0.000) — compression is reversible via cache, so the model can fetch an original when it matters.
Spans your whole agent stack — same proxy wraps Claude Code and Codex.

5 · Risks & the telemetry verdict

Resolved as of v0.27.0 (verified in the installed source)

Telemetry is now opt-in — off by default, fail-closed ("nothing is collected or sent unless you opt in"). Even when on, the beacon never sends your prompts or tool outputs — only aggregate stats: a session ID, a hashed hostname, versions, OS/arch, tokens saved, compression %, model names, latency. No prompt content, no file paths, no API keys. The earlier knock (default-on in v0.26.0) no longer applies to current releases.

So the agentsview-style "default-on telemetry a scan missed" concern is closed on v0.27.0. If you ever deliberately opt in, export HEADROOM_TELEMETRY=off turns it back off (or --no-telemetry / --stateless).

Egress path	When active	Carries your content?
Anonymous beacon	Opt-in (v0.27.0+)	No — aggregate stats + hashed host
Daily PyPI update check	Default-on	No — version ping (`HEADROOM_UPDATE_CHECK=off`)
Enterprise license reporter	Only if `HEADROOM_LICENSE_KEY` set	No — token counts + key
Cloud-compression middleware	Only if you wire it + set a key	Can — but off by default
Langfuse tracing	Only if you opt in with your keys	To your own account

The proxy reads everything in plaintext — that's its job — but the Rust engine doesn't phone home, and no default path transmits content. (Minor smell: the anon DB key is hardcoded and string-split to dodge secret scanners. Benign payload, but worth knowing.)

The other risks

Risk	Severity	Detail / mitigation
Lossy on Bash output	Watch	File tools (Read/Glob/Grep/Write/Edit) are never compressed — exact content preserved. But Bash output IS compressed by default. Carrier data pulled via a shell command is in scope; use the `preserve_fields` whitelist or start in `audit` mode.
Long-Opus-turn timeout	Relevant to you	Open issue #1261: proxy 502 on long Opus turns. You run Opus 4.8 — this is your exact failure surface. Trial before trusting it on big runs.
0.x churn	Watch	Beta, a release roughly every 4 days, ~380 open issues/PRs. It rewrites agent config files — lean on `unwrap`/backups.
Brittle ML deps	Minor	The `[all]` extra pulls torch/transformers/onnx with tight version pins. The `[proxy]` extra avoids most of it.

6 · Overlaps with your stack

Nothing you run does in-flight output compression — so Headroom fills a real gap rather than duplicating a tool.

Existing tool	Relationship
claude-mem	Complementary — different layer. Memory persists work across sessions; Headroom compresses traffic within a turn.
repomix / agy	Complementary — one-shot pack of a codebase vs. continuous in-flight compression of live outputs.
agentsview (also a lab pick)	Pairs — that one measures token use; Headroom reduces it.
Opus 4.8 [1m] context	Amplifies — bigger window = more upside from trimming.

Adopt — conditionally, as a measured trial

Install isolated, kill telemetry, wrap one project first (so it only touches that project's .claude/settings.local.json, not your global config), and watch it in audit mode before letting it actually compress.

# 1. isolated install (lean extra, no torch/onnx)
pipx install --python python3.13 "headroom-ai[proxy]"

# 2. mandatory: kill default telemetry + update pings
export HEADROOM_TELEMETRY=off
export HEADROOM_UPDATE_CHECK=off

# 3. observe-only first — see what it WOULD compress, change nothing
export HEADROOM_DEFAULT_MODE=audit
cd ~/dev/<one-throwaway-project>
headroom wrap claude

# 4. check the numbers, then flip to optimize if it looks good
headroom perf

# 5. done testing? fully reverse it
headroom unwrap claude

Watch for the #1261 502 on a long Opus turn during the trial — that's the one failure mode that'd actually disrupt your workflow. Note: this report is text suggestions only; the lab never installs or runs anything it finds.

Also recommended by the lab

Out of scope for this Headroom-focused report, but these are the other tools the audit flagged as worth cherry-picking. Full ratings live in the catalog.

Tool	Rating	Why
anthropics/knowledge-work-plugins	8	First-party operator skills (sales/finance/legal/PM) — maps to your non-dev roles.
microsoft/markitdown	7	Office docs (docx/xlsx/pptx) → markdown. Carrier-data ingestion fit.
hardikpandya/stop-slop	7	In-context anti-slop prose rules — maps to your "sparse deadpan" voice.
mattpocock/skills	7	Engineering skills; gap-fill on architecture review.
phuryn/pm-skills	6	PM/GTM/strategy pack — covers IMO/advisory work your dev skills don't.
kenn-io/agentsview	6	Cross-agent token observability — only with its PostHog telemetry disabled.

→ Full catalog (76 repos) · → Run history