T
traeai
RSS登录
返回首页
cohere(@cohere)

Excited to share our work on production-ready W4A8 inference, now integrated in vLLM! By combining 4...

9.0Score
Excited to share our work on production-ready W4A8 inference, now integrated in vLLM! By combining 4...
AI 深度提炼
  • 结合4-bit权重和8-bit激活实现内存与计算平衡。
  • 相比W4A16,TTFT提升58%,TPOT提升45%。
  • 优化方案已集成至开源项目vLLM。
#推理优化#vLLM#Cohere#机器学习
打开原文

Cohere on X: "Excited to share our work on production-ready W4A8 inference, now integrated in vLLM! By combining 4-bit weights (low memory) with 8-bit activations (high compute), we hit the sweet spot for both decoding and prefill — up to 58% faster TTFT and 45% faster TPOT vs W4A16 on Hopper. https://t.co/M37wT5KS8Z" / X

Don’t miss what’s happening

People on X are the first to know.

Log in

Sign up

Post

See new posts

Conversation

![Image 3: Square profile picture](http://x.com/cohere)

Cohere

@cohere

Excited to share our work on production-ready W4A8 inference, now integrated in vLLM! By combining 4-bit weights (low memory) with 8-bit activations (high compute), we hit the sweet spot for both decoding and prefill — up to 58% faster TTFT and 45% faster TPOT vs W4A16 on Hopper.

![Image 4: Image](http://x.com/cohere/status/2047052557915476304/photo/1)

8:38 PM · Apr 22, 2026

·

5,241 Views

3

14

99

35

New to X?

Sign up now to get your own personalized timeline!

Sign up with Apple

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Relevant people

Trending now

What’s happening

Sports · Trending

#BURMCI

Trending in United States

Grapefruit

Politics · Trending

Hung Cao

Trending with Phelan, Secretary of the Navy

Technology · Trending

Storage Wars

Trending with Darrell Sheets

Show more

Terms of Service

|

Privacy Policy

|

Cookie Policy

|

Accessibility

|

Ads info

|

More

© 2026 X Corp.