Claude Mythos Preview Early Snapshot Achieves Over 2x Time Horizon vs. Next Best Model in METR Evaluation
Claude Mythos Preview achieved a time horizon more than 2x that of the next best model on METR's 80% success rate benchmark, with a minimum 50%-time-horizon of 16 hours (95% CI: 8.5–55 hours).
入选理由:Claude Mythos Preview 时间跨度达 16 小时(95% CI 8.5–55 小时)
