At @augmentcode, we took a counter-intuitive bet on our AI architecture.

TL;DR · AI Summary
Augment Code used Mercury 2 as a dedicated subagent, achieving an 82% faster context compaction and 90% lower summarization costs.
Key Takeaways
- Using Mercury 2 as a dedicated subagent, context compaction speed improved by 82
- Summarization costs reduced by 90%, with tool search summaries under 1 second.
- Reduced LLM spend by 30% via Prism routing.
Outline
Jump quickly between sections.
Augment Code made a counter-intuitive decision on AI architecture.
Used Mercury 2 as a dedicated subagent instead of the primary coding model to preserve KV cache.
Users gained 82% faster context compaction and 90% lower summarization costs.
Tool search summaries under 1 second, LLM spend reduced by 30%.
Mindmap
See how the topics connect at a glance.
查看大纲文本(无障碍 / 无 JS 友好)
- AI 架构
- Mercury 2
- 82% 上下文压缩速度提升
- 90% 摘要成本降低
- Prism 路由
- 30% LLM 开销降低
Highlights
Key sentences worth saving and sharing.
Using Mercury 2 as a dedicated subagent, achieved an 82% faster context compaction.
Summarization costs reduced by 90%, with tool search summaries under 1 second.
Reduced LLM spend by 30% via Prism routing.
Instead of using the primary coding model to preserve KV cache (the industry standard), we used Mercury 2 by @_inception_ai as a dedicated subagent.
The payoff for our users: 82% faster context" / X

At
, we took a counter-intuitive bet on our AI architecture. Instead of using the primary coding model to preserve KV cache (the industry standard), we used Mercury 2 by
as a dedicated subagent. The payoff for our users: 82% faster context compaction, 90% lower summarization costs, <1s tool-search summaries, 30% lower LLM spend via Prism routing Read the full story here: inceptionlabs.ai/blog/rise-of-r
Quote

Inception
@_inception_ai
10h
@augmentcode rebuilt their context compaction layer around Mercury 2. 82% latency cut. 90% cost cut. Comparable quality to Opus 4.7. Running in production today. "We took a counter-intuitive bet. We decoupled summarization entirely, offloading it to Mercury 2 as a dedicated