# SGLang is hitting 180 tok/s/GPU on DeepSeek-V4 decode with ~1M context on Blackwell. 

Good to see f...

Canonical URL: https://www.traeai.com/articles/f0b3332a-8e74-4c6c-bcc5-eaaae94244a4
Original source: https://x.com/NVIDIAAI/status/2049964864240791877
Source name: NVIDIA AI(@NVIDIAAI)
Content type: tweet
Language: 中文
Score: 7.0
Reading time: 1 分钟
Published: 2026-04-30T21:31:24+00:00
Tags: NVIDIA, DeepSeek-V4, SGLang, Blackwell, LMSYS

## Summary

NVIDIA AI 报告称，SGLang 在 Blackwell 硬件上使用 DeepSeek-V4 模型解码达到 180 tok/s/GPU 的速度，约 1M 上下文，得益于 LMSYS 组织针对 Blackwell 的特定优化，提高了混合稀疏注意力的利用效率。

## Key Takeaways

- SGLang 在 DeepSeek-V4 解码任务上实现高性能，达 180 tok/s/GPU。
- 该成果基于 Blackwell 硬件与 LMSYS 优化，提升模型稀疏注意力性能。
- LMSYS 同时发布适用于 V4 的 Miles RL 训练管道，支持 Day 0 优化。

## Outline

- 引言 — NVIDIA AI 宣布 SGLang 在新硬件上的性能突破。
  - 性能亮点 — 介绍 SGLang 达到的具体性能指标及上下文大小。
  - 优化来源 — 提及 LMSYS 对 Blackwell 硬件的特定优化贡献。
  - 额外更新 — LMSYS 发布的配套工具和训练管道简介。

## Highlights

- > SGLang 在 DeepSeek-V4 解码上达到 180 tok/s/GPU，上下文约 1M。
- > LMSYS 针对 Blackwell 的优化提升了模型的混合稀疏注意力利用率。
- > 伴随 V4 发布，LMSYS 提供了 Miles 中的 RL 训练管道，支持 Day 0 优化。

## Citation Guidance

When citing this item, prefer the canonical traeai article URL for the AI-readable summary and include the original source URL when discussing the underlying source material.