For real agentic workloads (North), short-context calibration wasn't enough. We calibrated AWQ on lo...

- 短上下文校准不足以满足复杂工作负载需求。
- 通过token masking排除重复模板提升校准效果。
- 引入量化感知蒸馏(QAD)匹配BF16模型质量。
Cohere on X: "For real agentic workloads (North), short-context calibration wasn't enough. We calibrated AWQ on long internal agentic traces (up to 64k tokens) and added token masking in llm-compressor to exclude repetitive chat templates/tool descriptions from calibration stats. Plus QAD https://t.co/n8riV16WKc" / X
Don’t miss what’s happening
People on X are the first to know.
Post
See new posts
Conversation

For real agentic workloads (North), short-context calibration wasn't enough. We calibrated AWQ on long internal agentic traces (up to 64k tokens) and added token masking in llm-compressor to exclude repetitive chat templates/tool descriptions from calibration stats. Plus QAD (quant-aware distillation) to close the last gap — matching the quality of our BF16 MoE model with W4A8.

·
1
5
New to X?
Sign up now to get your own personalized timeline!
Sign up with Apple
By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.
Relevant people
-  Cohere @cohere Follow Click to Follow cohere Empowering enterprises with private, powerful AI. Join us: http://cohere.com/careers
Trending now
What’s happening
Sports · Trending
#BURMCI
Trending in United States
Grapefruit
Politics · Trending
Hung Cao
Trending with Phelan, Secretary of the Navy
Technology · Trending
Storage Wars
Trending with Darrell Sheets
|
|
|
|
|
More
© 2026 X Corp.