cohere(@cohere)2026年4月22日

Excited to share our work on production-ready W4A8 inference, now integrated in vLLM! By combining 4...

9.0Score

用这条生成生成视频方案

Excited to share our work on production-ready W4A8 inference, now integrated in vLLM! By combining 4...

AI 深度提炼

结合4-bit权重和8-bit激活实现内存与计算平衡。
相比W4A16，TTFT提升58%，TPOT提升45%。
优化方案已集成至开源项目vLLM。

#推理优化#vLLM#Cohere#机器学习

打开原文

Cohere on X: "Excited to share our work on production-ready W4A8 inference, now integrated in vLLM! By combining 4-bit weights (low memory) with 8-bit activations (high compute), we hit the sweet spot for both decoding and prefill — up to 58% faster TTFT and 45% faster TPOT vs W4A16 on Hopper. https://t.co/M37wT5KS8Z" / X

Don’t miss what’s happening

People on X are the first to know.

Post

See new posts

Conversation

![Image 3: Square profile picture](http://x.com/cohere)

Cohere

@cohere

Excited to share our work on production-ready W4A8 inference, now integrated in vLLM! By combining 4-bit weights (low memory) with 8-bit activations (high compute), we hit the sweet spot for both decoding and prefill — up to 58% faster TTFT and 45% faster TPOT vs W4A16 on Hopper.

![Image 4: Image](http://x.com/cohere/status/2047052557915476304/photo/1)

8:38 PM · Apr 22, 2026

5,241 Views

New to X?

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Relevant people

![Image 5: Square profile picture](http://x.com/cohere) Cohere @cohere Follow Click to Follow cohere Empowering enterprises with private, powerful AI. Join us: http://cohere.com/careers

Trending now

What’s happening

Sports · Trending

#BURMCI

Trending in United States

Grapefruit

Politics · Trending

Hung Cao

Trending with Phelan, Secretary of the Navy

Technology · Trending

Storage Wars

Trending with Darrell Sheets

Cookie Policy

Accessibility

Ads info