ai-tool-radar · lab recommendation review
chopratejas/headroom · context-compression proxy for LLM agents · assessed 2026-06-21
The lab's top-rated find (8/10, top pick across all 3 audits) holds up under a source-level check. It compresses the bloat that actually fills your context — big Bash logs, tool dumps — for real, benchmarked token savings, works with Claude Code with zero code, and never transmits your prompt content. The old telemetry knock is gone: as of v0.27.0 telemetry is opt-in (off by default). Smoke-tested 2026-06-21 — installs clean, ~25% saved on a JSON sample, precision guardrails confirmed.
Headroom is the single most consistent recommendation the audit has ever produced. It was rated 8/10 — TOP PICK on its first assessment (Jun 16) and has held that spot across all three audit runs.
The lab's reasoning: token/context efficiency is the highest-leverage axis for how you work — you ship through AI agents at high autonomy, so anything that cuts tokens-per-task beats a net-new capability. Its security pass was clean (trivy: 0 CVEs, 0 secrets; only a benign root-user note in test-fixture Dockerfiles).
The lab rated Headroom before it adopted its "dig into the source for phone-home behavior" rule (that discipline was learned a few days later, on a different tool). So the most important question for a tool that sits in the middle of your traffic — does it send anything out? — was never actually checked by the audit. This report checked it directly against the source. (Answer in §5.)
A local HTTP proxy (Python core with a bundled Rust engine) that sits between your agent and the model. Before your tool outputs, logs, files, and conversation history reach the LLM, it compresses them — then lets the model pull the originals back on demand if it needs them.
The docs list ~8 ways to install/integrate (proxy, Python SDK, TypeScript SDK, LangChain, Vercel middleware, MCP server, Docker…). Ignore the menu. For Claude Code there is exactly one:
# install once, isolated (keeps heavy ML deps out of your global env)
pipx install --python python3.13 "headroom-ai[proxy]"
# turn off default telemetry, then wrap Claude Code through the proxy
export HEADROOM_TELEMETRY=off
headroom wrap claude
headroom wrap claude starts the local proxy and points Claude Code at it via
ANTHROPIC_BASE_URL — no code, no MCP. headroom unwrap claude reverses it cleanly.
| Path | For | You? |
|---|---|---|
headroom wrap claude | Running an agent CLI (Claude Code, Codex) | Yes |
headroom proxy | Generic OpenAI-compatible client, any language | wrap does this |
Python / TS SDK, withHeadroom() | Code you write that calls the API | Later |
| LangChain / Vercel middleware | Apps in those frameworks | n/a |
| MCP server | MCP clients | No — MCP |
| Docker image | Containerized deploy | no |
The SDK paths are a separate, later option: if you ever want to cut tokens
inside your own apps — WarRoom Discord bots, a Supabase pipeline that calls Claude — that's where
withHeadroom(new Anthropic()) would live. Different decision from "use it with Claude Code."
wrap/proxy path is a genuine CLI — you never touch the MCP server.pipx with a pinned interpreter is the documented isolated path; prebuilt wheels mean the Rust engine is bundled — nothing separate to manage.unwrap removes only Headroom's keys; Codex configs even get a byte-for-byte backup.| Workload | Before → after | Saved |
|---|---|---|
| Code search (100 results) | 17,765 → 1,408 | 92% |
| Incident / log debugging | 65,694 → 5,118 | 92% |
| GitHub issue triage | 54,174 → 14,761 | 73% |
| Codebase exploration | 78,502 → 41,254 | 47% |
Telemetry is now opt-in — off by default, fail-closed ("nothing is collected or sent unless you opt in"). Even when on, the beacon never sends your prompts or tool outputs — only aggregate stats: a session ID, a hashed hostname, versions, OS/arch, tokens saved, compression %, model names, latency. No prompt content, no file paths, no API keys. The earlier knock (default-on in v0.26.0) no longer applies to current releases.
So the agentsview-style "default-on telemetry a scan missed" concern is closed on v0.27.0. If you ever deliberately
opt in, export HEADROOM_TELEMETRY=off turns it back off (or --no-telemetry / --stateless).
| Egress path | When active | Carries your content? |
|---|---|---|
| Anonymous beacon | Opt-in (v0.27.0+) | No — aggregate stats + hashed host |
| Daily PyPI update check | Default-on | No — version ping (HEADROOM_UPDATE_CHECK=off) |
| Enterprise license reporter | Only if HEADROOM_LICENSE_KEY set | No — token counts + key |
| Cloud-compression middleware | Only if you wire it + set a key | Can — but off by default |
| Langfuse tracing | Only if you opt in with your keys | To your own account |
The proxy reads everything in plaintext — that's its job — but the Rust engine doesn't phone home, and no default path transmits content. (Minor smell: the anon DB key is hardcoded and string-split to dodge secret scanners. Benign payload, but worth knowing.)
| Risk | Severity | Detail / mitigation |
|---|---|---|
| Lossy on Bash output | Watch | File tools (Read/Glob/Grep/Write/Edit) are never compressed — exact content preserved. But
Bash output IS compressed by default. Carrier data pulled via a shell command is in scope; use the
preserve_fields whitelist or start in audit mode. |
| Long-Opus-turn timeout | Relevant to you | Open issue #1261: proxy 502 on long Opus turns. You run Opus 4.8 — this is your exact failure surface. Trial before trusting it on big runs. |
| 0.x churn | Watch | Beta, a release roughly every 4 days, ~380 open issues/PRs. It rewrites agent config files — lean on unwrap/backups. |
| Brittle ML deps | Minor | The [all] extra pulls torch/transformers/onnx with tight version pins. The [proxy] extra avoids most of it. |
Nothing you run does in-flight output compression — so Headroom fills a real gap rather than duplicating a tool.
| Existing tool | Relationship |
|---|---|
| claude-mem | Complementary — different layer. Memory persists work across sessions; Headroom compresses traffic within a turn. |
| repomix / agy | Complementary — one-shot pack of a codebase vs. continuous in-flight compression of live outputs. |
| agentsview (also a lab pick) | Pairs — that one measures token use; Headroom reduces it. |
| Opus 4.8 [1m] context | Amplifies — bigger window = more upside from trimming. |
Install isolated, kill telemetry, wrap one project first (so it only touches that project's
.claude/settings.local.json, not your global config), and watch it in audit mode before
letting it actually compress.
# 1. isolated install (lean extra, no torch/onnx)
pipx install --python python3.13 "headroom-ai[proxy]"
# 2. mandatory: kill default telemetry + update pings
export HEADROOM_TELEMETRY=off
export HEADROOM_UPDATE_CHECK=off
# 3. observe-only first — see what it WOULD compress, change nothing
export HEADROOM_DEFAULT_MODE=audit
cd ~/dev/<one-throwaway-project>
headroom wrap claude
# 4. check the numbers, then flip to optimize if it looks good
headroom perf
# 5. done testing? fully reverse it
headroom unwrap claude
Watch for the #1261 502 on a long Opus turn during the trial — that's the one failure mode that'd actually disrupt your workflow. Note: this report is text suggestions only; the lab never installs or runs anything it finds.
Out of scope for this Headroom-focused report, but these are the other tools the audit flagged as worth cherry-picking. Full ratings live in the catalog.
| Tool | Rating | Why |
|---|---|---|
| anthropics/knowledge-work-plugins | 8 | First-party operator skills (sales/finance/legal/PM) — maps to your non-dev roles. |
| microsoft/markitdown | 7 | Office docs (docx/xlsx/pptx) → markdown. Carrier-data ingestion fit. |
| hardikpandya/stop-slop | 7 | In-context anti-slop prose rules — maps to your "sparse deadpan" voice. |
| mattpocock/skills | 7 | Engineering skills; gap-fill on architecture review. |
| phuryn/pm-skills | 6 | PM/GTM/strategy pack — covers IMO/advisory work your dev skills don't. |
| kenn-io/agentsview | 6 | Cross-agent token observability — only with its PostHog telemetry disabled. |