T
traeai
Sign in

概念

MoE

别名:Mixture of Experts

混合专家模型架构,通过稀疏激活实现大参数量下的高效推理。

已跟踪 14 条高相关材料

TraeAI 观察

相关材料

已收录 14 条与 MoE 相关的内容,按评分排序。

将 600 亿参数大模型装进手机的瓶颈,终于被中国 AI 公司突破了

A Chinese AI company has broken the bottleneck of running a 60 billion parameter model on mobile devices using ternary quantization, saving 6x memory with minimal performance loss.

入选理由:三值量化可节省6倍显存,保留97%模型能力,支持在8GB内存手机运行600亿参数模型。

FeaturedArticle#AI Model#Ternary Quantization#Ascend Chip#Edge AI#Model Compression中文
https://t.co/nw0GoHamCI

DeepSeek's $10 Trillion Grand Strategy [Translation]

宝玉(@dotey)5655 字 (约 23 分钟)
92

DeepSeek builds a low-cost, high-efficiency model system through multiple foundational innovations to drive China's $10 trillion AI hardware ecosystem and achieve its own $1 trillion valuation.

入选理由:DeepSeek V4 Pro在100万上下文中仅需5.48GB HBM显存,远低于竞品的60-89GB。

FeaturedTweet#DeepSeek#AI Model#MoE#KV Cache Optimization#Hardware Ecosystem中文
DeepSeek 的 10 万亿美元大战略

DeepSeek's 10 Trillion USD Grand Strategy

宝玉的分享5756 字 (约 24 分钟)
92

DeepSeek reduces KV cache requirements through innovations, driving China's AI hardware ecosystem toward a $10 trillion industry.

入选理由:DeepSeek V4 Pro仅需5.48GB HBM,相比GLM5的60GB和Qwen3-235B-A22B的89GB显著节省显存

FeaturedArticle#AI Model#Hardware Ecosystem#KV Cache#DeepSeek#China AI中文
[AINews] Thinking Machines' Native Interaction Models - TML-Interaction-Small 276B-A12B - advances SOTA Realtime Voice and kills standard VAD

Thinking Machines released TML-Interaction-Small 276B-A12B, a 276-billion-parameter MoE model with only 12 billion active parameters, achieving sub-200ms end-to-end latency, surpassing GPT-4o and Gemini 3.1-Flash in real-time voice interaction, time-aligned microturns, and visual proactivity, effectively eliminating standard VAD.

入选理由:TML-Interaction-Small为276B参数MoE模型,仅12B激活参数,实现<200ms端到端延迟。

FeaturedArticle#AI#Realtime Voice#MoE#Multimodal#Model Architecture中文
Mellum2 Goes Open Source: A Fast Model for AI Workflows

Mellum2 Goes Open Source: A Fast Model for AI Workflows

The JetBrains Blog606 字 (约 3 分钟)
85

Mellum2 is an open-source 12B parameter AI model from JetBrains, using MoE architecture to activate only 2.5B parameters per token, reducing inference time by over 50% compared to similar-sized models, specifically designed for software engineering environments with applications in routing, RAG pipelines, and private AI deployment.

入选理由:Mellum2采用MoE架构,12B参数模型每token仅激活2.5B参数,推理速度比同类模型快50%,显著降低生产环境延迟和成本

FeaturedArticle#AI#Model#Mellum2#MoE#Software Engineering中文
任务成本仅为Claude Opus 4.6 1/9,阶跃刷新Flash模型效率

Step 3.7 Flash by Yujue Star is a new-generation Flash model for production-grade AI Agents, featuring native multimodal understanding, high throughput with low latency, and enhanced web search. It achieves 97% of Claude Opus 4.6's coding performance at only 1/9 the cost per task, ideal for high-frequency, complex real-world workflows.

入选理由:Step 3.7 Flash 采用稀疏 MoE 架构,激活参数仅 11B,最高生成速度达 400 Tokens/s,支持 40 个 Agent 并行运行。

FeaturedArticle#AI Agent#Multimodal#Flash Model#Yujue Star#Production Deployment中文
Step 3.7 Flash from @StepFun_ai is live on OpenRouter.

A multimodal (image/video/text) MoE that act...

Step 3.7 Flash 是一个支持多模态的 MoE 模型,仅激活 11B 参数即可处理 196B 参数规模的任务,适用于编码、代理工作流和结构化输出。

入选理由:Step 3.7 Flash 模型通过激活 11B 参数处理 196B 参数规模任务,显著降低计算成本。

FeaturedTweet#MoE#多模态#AI 模型#OpenRouter中文
MoE环游记:8、强制序列级均衡

MoE Travelogue: 8. Forced Sequence-Level Load Balancing

科学空间4785 字 (约 20 分钟)
85

This article proposes a new sequence-level load balancing method called Moving Quantile Balancing (MQB), which achieves fine-grained balance in MoE models without relying on auxiliary losses.

入选理由:MQB方法基于Quantile Balancing演化而来,适用于序列级负载均衡。

FeaturedArticle#MoE#Load Balancing#Deep Learning#Routing Mechanism#MQB中文
We published new research on how we serve post-trained Qwen3 235B models on NVIDIA GB200 NVL72 Black...

Perplexity published new research on deploying the Qwen3 235B model on NVIDIA GB200 NVL72 Blackwell racks, showing that GB200 outperforms Hopper in high-throughput inference for large MoE models.

入选理由:Qwen3 235B 模型在 NVIDIA GB200 上实现了高效的高吞吐量推理。

FeaturedTweet#NVIDIA#GB200#Qwen3#MoE#High Performance Computing中文
Today we're shipping Nemotron 3 Ultra.

A 550B MoE frontier-intelligence open model built for long-r...

NVIDIA Ships Nemotron 3 Ultra: 550B MoE Open Model

NVIDIA AI(@NVIDIAAI)104 字 (约 1 分钟)
75

NVIDIA released Nemotron 3 Ultra, a 550B MoE open model for long-running agents, offering 5x faster inference and 30% lower costs.

入选理由:Nemotron 3 Ultra采用550B参数MoE架构,是面向前沿智能的开源模型。

FeaturedTweet#NVIDIA#Nemotron#MoE#AI Agent#Open Source英文
7/ 🧩这不是剪枝

ZEDA 更像让 MoE 有了“算力预算意识”。

未来模型不只决定回答什么,还会决定每个 token 值不值得认真思考。

Paper: Post-Trained MoE C...

7/ 🧩This Is Not Pruning

AI Will(@FinanceYF5)244 字 (约 1 分钟)
75

ZEDA is a novel MoE technique that uses self-distillation to dynamically skip experts, improving inference efficiency and giving models compute budget awareness.

入选理由:ZEDA 使用自蒸馏方法使 MoE 模型跳过一半专家,提升推理效率。

FeaturedTweet#MoE#Mixture-of-Experts#AI Efficiency#Self-Distillation#ZEDA中文
本周Huggingface暂时第一名的论文:MACE

用MoE构架做音乐驱动舞蹈视频。

哈哈哈,感觉抖音AI跳舞视频估计要更真了。

https://t.co/qmSpyQGC0a

This Week's Top Paper on Hugging Face: MACE

向阳乔木(@vista8)124 字 (约 1 分钟)
55

The MACE paper proposes a MoE-based architecture for music-driven dance video generation, improving motion-rhythm synchronization, with potential applications in TikTok-style AI dance videos.

入选理由:MACE使用MoE架构实现音乐到舞蹈动作的高精度对齐,提升生成视频的真实感。

FeaturedTweet#MACE#MoE#Music-to-Dance#Hugging Face#AI Video Generation中文
Nemotron 3 Ultra is coming.

Nemotron 3 Ultra is coming

NVIDIA Developer395 字 (约 2 分钟)
45

NVIDIA announces Nemotron 3 Ultra, an open-source model claiming 5x speed and 30% lower cost, but lacks technical details.

入选理由:Nemotron 3 Ultra采用SSM与MoE混合架构,推理速度比现有开源模型快5倍。

FeaturedVideo#Nemotron#SSM#MoE#Open Source Model#NVIDIA英文

跨材料问答 · MoE

回答基于:MoE 相关 14 条材料
    0 / 500

    AI may generate inaccurate information. Please verify important content.