Fireworks AI on X: "Many research labs only consider inference efficiency after the fact. Step 3.7 Flash is a 198B sparse MoE VLM designed by @StepFun_ai for inference from the start. 196B language backbone with a 1.8B vision encoder. Built for real-world agent workloads, running at up to 400 ..."
Fireworks AI(@FireworksAI_HQ)112 字 (约 1 分钟)
82
Fireworks AI introduces Step 3.7 Flash: a 198B sparse MoE VLM designed for inference from the start, with a 196B language backbone and 1.8B vision encoder, achieving up to 400 token/s on real-world agent workloads.
入选理由:从设计阶段即优化推理效率,非事后补强。
FeaturedTweet#Step3.7 Flash#sparse MoE#VLM#198B#400 token/s英文
