Half of the Expert Computation in MoE Models Is Wasted on Unnecessary Tokens
About 50% of expert computation in MoE models is wasted on tokens that don't require expert processing; ZEDA can skip such computations to improve efficiency.
入选理由:MoE模型中约50%专家计算无效,因部分token无需专家处理

