EMO: Pretraining Mixture of Experts for Emergent Modularity
Hugging Face Blog1748 字 (约 7 分钟)
90
EMO is a mixture-of-experts model that achieves modular structure emergence through end-to-end pretraining, retaining near-full-model performance with only 12.5% of experts activated.
入选理由:EMO 使用14B总参数、1B活跃参数,仅激活1/8专家即达近全模型性能。
FeaturedArticle#Mixture of Experts#Modularity#Large Language Model#AI Research#Pretraining中文
