ICYMI from a few weeks back, we compiled our learnings around how to achieve Training-Inference Pari...

- 浮点加法非结合性是训练推理不一致的根本原因,(a+b)+c ≠ a+(b+c)。
- 即使数学等价的核融合操作,在实际计算中仍可能因顺序不同产生数值漂移。
- 该问题在服务 Kimi K2.5 等 MoE 模型时已引发实际 parity bug,需针对性优化。
Don’t miss what’s happening
People on X are the first to know.
Post
Conversation

ICYMI from a few weeks back, we compiled our learnings around how to achieve Training-Inference Parity in MoE Models. The Fundamental Issue: FP Addition Is Not Associative. (a + b) + c ≠ a + (b + c)
Quote

Fireworks AI
@FireworksAI_HQ
Apr 18
Training-Inference Parity in MoE Models: Where Numerics Drift
When Faster ≠ Identical: Numerical Pitfalls in Serving MoE Models Kernel fusions that are mathematically equivalent can still drift numerically. Here are the parity bugs we hit across both Kimi K2.5...
New to X?
Sign up now to get your own personalized timeline!
Trending now
What’s happening
Trending in United States
Alleges
Trending in United States
Thomas Dowd
Trending in United States
The Onion
Sports · Trending
Justin Jefferson