7/ 🧩This Is Not Pruning
AI Will(@FinanceYF5)244 字 (约 1 分钟)
75
ZEDA is a novel MoE technique that uses self-distillation to dynamically skip experts, improving inference efficiency and giving models compute budget awareness.
入选理由:ZEDA 使用自蒸馏方法使 MoE 模型跳过一半专家,提升推理效率。
FeaturedTweet#MoE#Mixture-of-Experts#AI Efficiency#Self-Distillation#ZEDA中文
