Fast, faster, Qwen. 🚀 Thrilled to see Qwen3.5 reaching a record-breaking 580 tps for agentic workl...

TL;DR · AI Summary
Qwen3.5 达到 580 tps 的记录性突破,得益于 TokenSpeed 引擎和合作伙伴的优化。
Key Takeaways
- Qwen3.5 在 TokenSpeed 引擎上实现 580 tps 的性能。
- FA4 优化由 Lightseek、NVIDIA、Mooncake 和 Tri Dao 提供。
- 该成就推动了开源大语言模型推理的边界。
Outline
Jump quickly between sections.
Mindmap
See how the topics connect at a glance.
查看大纲文本(无障碍 / 无 JS 友好)
- Qwen3.5 性能突破
Highlights
Key sentences worth saving and sharing.
Qwen3.5 达到 580 tps 的记录性突破。
FA4 优化由 Lightseek、NVIDIA、Mooncake 和 Tri Dao 提供。
该成就推动了开源大语言模型推理的边界。
Excited to see Qwen3.5 achieving a record-breaking 580 tps for agentic workloads on the TokenSpeed engine! This milestone wouldn't be possible without our incredible partners.
Big thanks to @lightseekorg, @NVIDIAAI, the Mooncake team, and @tri_dao for" / X

Fast, faster, Qwen. Excited to see Qwen3.5 achieving a record-breaking 580 tps for agentic workloads on the TokenSpeed engine! This milestone wouldn't be possible without our incredible partners. Big thanks to
,
, the Mooncake team, and
for the pioneering FA4 optimization. Together, we are pushing the boundaries of open-source LLM inference. Dive into the full
blog post below! pytorch.org/blog/up-to-580#Qwen#Qwen3_5#TokenSpeed#LLM#Inference#AI#PyTorch#OpenSource#AgenticAI#HighPerformance
Quote
PyTorch
@PyTorch
7h
The lightning-fast optimization for Qwen3.5 on the TokenSpeed inference engine is a significant milestone, achieving a record-breaking 580 tokens per second (tps) for agentic workloads on NVIDIA GPUs. In the PyTorch Foundation's latest community blog post, you can learn all