T
traeai
Sign in
返回首页
Aravind Srinivas(@AravSrinivas)

GB 200s Change How One Does the Prefill and Decode Disaggregation When Serving Large MoEs Like Qwen

8.5Score
GB 200s Change How One Does the Prefill and Decode Disaggregation When Serving Large MoEs Like Qwen

TL;DR · AI Summary

GB 200s improve the prefill and decode disaggregation efficiency for large MoE models like Qwen, significantly enhancing throughput compared to the Hopper platform.

Key Takeaways

  • GB 200s are better suited for high-throughput inference on large MoE models comp
  • Perplexity has published research on deploying post-trained Qwen3 235B models on
  • GB 200s are not just a training platform but also a high-performance inference p

Outline

Jump quickly between sections.

  1. Introduces the impact of GB 200s on large MoE models.

  2. How GB 200s change the prefill and decode disaggregation process.

  3. Comparison of throughput between GB 200s and Hopper.

  4. Perplexity's publication on deploying Qwen3 235B models on GB 200s.

Mindmap

See how the topics connect at a glance.

查看大纲文本(无障碍 / 无 JS 友好)
  • GB 200s 与 MoE 模型
    • 预填充和解码分离
    • 性能对比
    • 研究发布

Highlights

Key sentences worth saving and sharing.

  • GB 200s change how one does the prefill and decode disaggregation when serving large MoEs like Qwen.

    Paragraph 1

    ⬇︎ 下载 PNG𝕏 分享到 X
  • GB200 is a major step up over Hopper for high-throughput inference on large MoE models, not just a training platform.

    Paragraph 2

    ⬇︎ 下载 PNG𝕏 分享到 X
  • We published new research on how we serve post-trained Qwen3 235B models on NVIDIA GB200 NVL72 Blackwell racks.

    Paragraph 2

    ⬇︎ 下载 PNG𝕏 分享到 X
#NVIDIA#MoE#Qwen#Hopper#GB 200
Open original article

Aravind Srinivas on X: "GB 200s change how one does the prefill and decode disaggregation when serving large MoEs like Qwen. We’ve published details of our stack quantifying the throughput benefits compared to serving on Hoppers." / X

Don’t miss what’s happening

Image 3

Aravind Srinivas ![Image 4](https://x.com/AravSrinivas)

@AravSrinivas

GB 200s change how one does the prefill and decode disaggregation when serving large MoEs like Qwen. We’ve published details of our stack quantifying the throughput benefits compared to serving on Hoppers.

Quote

Image 5: Square profile picture

Perplexity

@perplexity_ai

·

10h

We published new research on how we serve post-trained Qwen3 235B models on NVIDIA GB200 NVL72 Blackwell racks. GB200 is a major step up over Hopper for high-throughput inference on large MoE models, not just a training platform.

Image 6: Image

2:27 PM · May 12, 2026

·

22.9K Views

11

13

164

53

Read 11 replies

AI may generate inaccurate information. Please verify important content.