Step 3.7 Flash: A 196B MoE Model Built for Inference Efficiency
Step 3.7 Flash is a 196B MoE model designed from the ground up for inference efficiency, using MFA and AFD techniques to reduce KV-cache usage to ~22% of DeepSeek, supporting agent, coding, and multimodal workflows, open-sourced under Apache 2.0 and available on Fireworks.
入选理由:Step 3.7 Flash 是 196B MoE 模型,从设计之初就聚焦推理效率,而非事后优化。





