Fireworks AI on X: We ran 720 browser agent tasks with @nottecore across frontier models
Fireworks AI tests show baseline models had 20% retry rates in browser agent tasks, while Kimi K2.5/GLM-5/MiniMax M2.5 achieved near-zero retries with stable latency, directly impacting production system costs/delays/reliability.
入选理由:基线模型在5次调用中约1次输出畸形,导致多步骤工作流重试


