Fast, faster, Qwen. 🚀

Thrilled to see Qwen3.5 reaching a record-breaking 580 tps for agentic workl...

Q: 意义

该成就推动了开源大语言模型推理的边界。

Qwen(@Alibaba_Qwen)

Qwen(@Alibaba_Qwen)2026年5月27日

Fast, faster, Qwen. 🚀 Thrilled to see Qwen3.5 reaching a record-breaking 580 tps for agentic workl...

7.5Score

TL;DR · AI Summary

Qwen3.5 达到 580 tps 的记录性突破，得益于 TokenSpeed 引擎和合作伙伴的优化。

Key Takeaways

Qwen3.5 在 TokenSpeed 引擎上实现 580 tps 的性能。
FA4 优化由 Lightseek、NVIDIA、Mooncake 和 Tri Dao 提供。
该成就推动了开源大语言模型推理的边界。

Outline

Jump quickly between sections.

§引言
Qwen3.5 达到 580 tps 的记录性突破。
·性能里程碑
Qwen3.5 在 TokenSpeed 引擎上实现 580 tps 的性能。
·合作伙伴
FA4 优化由 Lightseek、NVIDIA、Mooncake 和 Tri Dao 提供。
·意义
该成就推动了开源大语言模型推理的边界。

Mindmap

See how the topics connect at a glance.

查看大纲文本（无障碍 / 无 JS 友好）

Qwen3.5 性能突破

Highlights

Key sentences worth saving and sharing.

Qwen3.5 达到 580 tps 的记录性突破。
— 第 1 段
⬇︎ 下载 PNG 𝕏 分享到 X
FA4 优化由 Lightseek、NVIDIA、Mooncake 和 Tri Dao 提供。
— 第 1 段
⬇︎ 下载 PNG 𝕏 分享到 X
该成就推动了开源大语言模型推理的边界。
— 第 1 段
⬇︎ 下载 PNG 𝕏 分享到 X

#Qwen#TokenSpeed#FA4#高性能#开源

Open original article

Excited to see Qwen3.5 achieving a record-breaking 580 tps for agentic workloads on the TokenSpeed engine! This milestone wouldn't be possible without our incredible partners.

Big thanks to @lightseekorg, @NVIDIAAI, the Mooncake team, and @tri_dao for" / X

Qwen

@Alibaba_Qwen

Fast, faster, Qwen. Image 2: 🚀 Excited to see Qwen3.5 achieving a record-breaking 580 tps for agentic workloads on the TokenSpeed engine! This milestone wouldn't be possible without our incredible partners. Big thanks to

@lightseekorg

,

@NVIDIAAI

, the Mooncake team, and

@tri_dao

for the pioneering FA4 optimization. Together, we are pushing the boundaries of open-source LLM inference. Image 3: 🤝 Image 4: ✨ Dive into the full

@PyTorch

blog post below! Image 5: 👇 pytorch.org/blog/up-to-580 #Qwen #Qwen3_5 #TokenSpeed #LLM #Inference #AI #PyTorch #OpenSource #AgenticAI #HighPerformance

Quote

PyTorch

@PyTorch

7h

The lightning-fast optimization for Qwen3.5 on the TokenSpeed inference engine is a significant milestone, achieving a record-breaking 580 tokens per second (tps) for agentic workloads on NVIDIA GPUs. In the PyTorch Foundation's latest community blog post, you can learn all

4:34 PM · May 27, 2026

236.6K Views