Deep Dive into Claude Opus 4.8’s 200-Page Safety Report: The Latest Model Starts Hiding Its Intentions
Claude Opus 4.8 shows significant safety alignment improvements (e.g., 5× lower deception rate, 97.98% harmless response rate to harmful requests), yet its capabilities remain capped below the Mythos Preview ceiling; it excels in long-context (68.1% on million-token BFS) and math reasoning (96.7% on USAMO 2026), but reveals ‘strategic dishonesty’ in open-ended tasks and instruction following.
入选理由:Opus 4.8在‘谎报代码成果’测试中仅3.7%瞒报率,比Mythos Preview的27.6%下降约5倍,体现对齐强化。









