T
traeai
Sign in

概念

什么是 Mixture-of-Experts (MoE)

也叫:MoE

一种稀疏激活的神经网络架构,通过条件计算降低推理成本同时保持模型容量。

为什么现在值得关注?

最近变化

2026-06-04 · Nemotron 3 Ultra采用混合Transformer-Mamba MoE架构,550B总参仅激活55B,显著降低Agent任务计算开销。

Mixture-of-Experts (MoE) 被反复提及时,通常意味着它正在影响产品路线、开发者工作流或 AI 产业判断。这个页面把分散材料合并成一个可持续更新的观察入口。

📰 Mixture-of-Experts (MoE) 最新动态

已收录 4 篇与「Mixture-of-Experts (MoE)」相关的 AI 资讯和分析。

Introducing Mellum2: A 12B Mixture-of-Experts Model by JetBrains

Introducing Mellum2: A 12B Mixture-of-Experts Model by JetBrains

Hugging Face Blog564 字 (约 3 分钟)
90

JetBrains releases Mellum2, a 12B-parameter MoE model activating only 2.5B params per token, offering 2x+ faster inference than peers, optimized for text/code tasks and private/RAG deployments.

入选理由:Mellum2 是 12B 参数 MoE 模型,每 token 仅激活 2.5B 参数,推理效率提升 2x+,适合高吞吐生产环境。

FeaturedArticle#MoE#JetBrains#Large Model#Code Generation#RAG英文
EMO: Pretraining mixture of experts for emergent modularity

EMO: Pretraining Mixture of Experts for Emergent Modularity

Hugging Face Blog1748 字 (约 7 分钟)
90

EMO is a mixture-of-experts model that achieves modular structure emergence through end-to-end pretraining, retaining near-full-model performance with only 12.5% of experts activated.

入选理由:EMO 使用14B总参数、1B活跃参数,仅激活1/8专家即达近全模型性能。

FeaturedArticle#Mixture of Experts#Modularity#Large Language Model#AI Research#Pretraining中文
Best Small Language Models on Hugging Face Right Now!

Best Small Language Models on Hugging Face Right Now!

KDnuggets3855 字 (约 16 分钟)
85

This article highlights the advancements in small language models, specifically those with under 7 billion parameters, which can now run on consumer GPUs or even laptops. It emphasizes that these models are now capable of performing tasks that were previously only achievable by much larger models, thanks to improvements in training data quality, distillation techniques, and architectural innovations like Mixture-of-Experts (MoE). The article provides a curated list of the best small language models available on Hugging Face, along with their capabilities and benchmark scores.

入选理由:Small language models under 7 billion parameters are now capable of performing complex tasks previously reserved for much larger models.

FeaturedArticle#Language Models#Hugging Face#AI#Machine Learning#Small Models英文
NVIDIA Nemotron 3 Ultra now available on Amazon SageMaker JumpStart

NVIDIA Nemotron 3 Ultra Now Available on Amazon SageMaker JumpStart

AWS Machine Learning Blog952 字 (约 4 分钟)
82

NVIDIA Nemotron 3 Ultra is now available on Amazon SageMaker JumpStart with one-click deployment. This 550B-parameter MoE model is designed for long-running agents, delivering 5x faster inference, 30% lower cost, and 1M token context support.

入选理由:Nemotron 3 Ultra采用混合Transformer-Mamba MoE架构,550B总参仅激活55B,显著降低Agent任务计算开销。

FeaturedArticle#Nemotron 3 Ultra#SageMaker JumpStart#Agentic AI#MoE#AWS英文

与「Mixture-of-Experts (MoE)」经常一起出现的 AI 术语。

💡 想追踪「Mixture-of-Experts (MoE)」的长期趋势?去 实体雷达 · Mixture-of-Experts (MoE) 查看详细分析和跨材料问答。

AI may generate inaccurate information. Please verify important content.