返回首页
Fireworks AI(@FireworksAI_HQ)

ICYMI from a few weeks back, we compiled our learnings around how to achieve Training-Inference Pari...

7.5Score
ICYMI from a few weeks back, we compiled our learnings around how to achieve Training-Inference Pari...
AI 深度提炼
  • 浮点加法非结合性是训练推理不一致的根本原因,(a+b)+c ≠ a+(b+c)。
  • 即使数学等价的核融合操作,在实际计算中仍可能因顺序不同产生数值漂移。
  • 该问题在服务 Kimi K2.5 等 MoE 模型时已引发实际 parity bug,需针对性优化。
#MoE#数值稳定性#推理优化#Fireworks AI
打开原文

Don’t miss what’s happening

People on X are the first to know.

Post

Conversation

![Image 1: Square profile picture](https://x.com/FireworksAI_HQ)

ICYMI from a few weeks back, we compiled our learnings around how to achieve Training-Inference Parity in MoE Models. The Fundamental Issue: FP Addition Is Not Associative. (a + b) + c ≠ a + (b + c)

Quote

Image 2: Square profile picture

Fireworks AI

@FireworksAI_HQ

Apr 18

Image 3: Article cover image

Training-Inference Parity in MoE Models: Where Numerics Drift

When Faster ≠ Identical: Numerical Pitfalls in Serving MoE Models Kernel fusions that are mathematically equivalent can still drift numerically. Here are the parity bugs we hit across both Kimi K2.5...

New to X?

Sign up now to get your own personalized timeline!

Trending now

What’s happening

Trending in United States

Alleges

Trending in United States

Thomas Dowd

Trending in United States

The Onion

Sports · Trending

Justin Jefferson