MuP 还有哪些别名？

MuP 也被称为：Maximal Update Parametrization。

MuP 最近有什么新动态？

traeai 已收录 1 篇与 MuP 相关的内容。最新一篇是「为什么官方版Muon比MuP版多出一个max(1, ⋅)？」，由科学空间发布。

概念

什么是 MuP？

也叫：Maximal Update Parametrization

最大更新参数化方法，用于指导大模型超参数迁移及优化器缩放因子设计。

为什么现在值得关注？

如果只读 3 篇

为什么官方版Muon比MuP版多出一个max(1, ⋅)？

科学空间 · 9.2 分

📰 MuP 最新动态

已收录 1 篇与「MuP」相关的 AI 资讯和分析。

Why Does the Official Muon Include an Extra max(1, ⋅) Compared to the MuP Version?

科学空间6月5日1705 字 (约 7 分钟)

The official Muon optimizer adds a max(1,⋅) truncation to stabilize updates during early training when inputs are isotropic, but the MuP scaling factor aligns better with steepest descent theory in later stages as features become anisotropic. Practitioners should prefer the MuP version or use a dynamic decay schedule transitioning from KellerJordan to MuP.

入选理由：KellerJordan版Muon的max(1,⋅)源于din>dout且输入各向同性时的RMS近似推导。

FeaturedArticle#Muon Optimizer#MuP#Deep Learning Optimization#Feature Scaling#LLM Training中文

与「MuP」经常一起出现的 AI 术语。

科学空间 KellerJordan Muon

💡 想追踪「MuP」的长期趋势？去实体雷达 · MuP 查看详细分析和跨材料问答。

什么是 MuP？

为什么现在值得关注？

如果只读 3 篇

📰 MuP 最新动态

Why Does the Official Muon Include an Extra max(1, ⋅) Compared to the MuP Version?

🔗 相关术语