The Latest Codex Updates and The Truth about Opus 4.8

7.8Score

Watchable video resourceOpen original video

TL;DR · AI Summary

Anthropic released Claude Opus 4.8, but experts like Greg Eisenberg and Matt Wolf argue it’s nearly indistinguishable from 4.7, signaling a shift to iPhone-style incremental upgrades; Deep Suite data shows GPT 5.5 outperforms Opus 4.8 in coding tasks at lower cost and token usage, while OpenAI’s Codex saw undisclosed but impactful updates.

Key Takeaways

Opus 4.8 vs 4.7: multiple experts—including the author—could not detect meaningf
Deep Suite benchmarks show GPT 5.5 achieves higher scores on SWEBench Pro with l
OpenAI rolled out significant, undisclosed updates to Codex as part of its ‘supe

Outline

Jump quickly between sections.

§Opus 4.8 Launch & Controversy
Anthropic claims Opus 4.8 is the world’s most advanced model, yet real-world comparisons reveal negligible gains over 4.7, sparking debate about the ‘iPhone effect’ in AI releases.
·Expert Reactions & Real-World Testing
Experts including Greg Eisenberg and Matt Wolf say 4.8 offers no meaningful improvement; the author spent 3 hours testing and still found no discernible difference.
·Deep Suite Benchmark Results
On SWEBench Pro, GPT 5.5 delivers higher scores at lower cost and fewer tokens than Opus 4.8, confirming OpenAI’s edge in engineering agent performance.
·OpenAI Codex Super App Updates
OpenAI launched major, unannounced enhancements to Codex within its super app, significantly boosting its utility for developers and AI agents.

Mindmap

See how the topics connect at a glance.

查看大纲文本（无障碍 / 无 JS 友好）

AI模型演进现状与Codex更新
- Opus 4.8发布争议
  - 官方宣称‘最先进’
  - 实测难辨与4.7差异
- 行业共识：渐进式升级
  - 类比iPhone迭代模式
  - 专家普遍不认为有质变
- GPT 5.5 vs Opus 4.8 实测结果
  - Deep Suite SWEBench Pro 数据
  - GPT 5.5 成本/效率优势
- OpenAI Codex 超应用更新
  - 未公开功能增强
  - 强化开发工具链整合

Highlights

Key sentences worth saving and sharing.

I literally couldn't tell the difference between the two models — and I'm not the only one who thinks this.
— Paragraph 1:37
⬇︎ 下载 PNG 𝕏 分享到 X
GPT 5.5, medium, high, and extra high are scoring higher for less cost than Anthropics Opus 4.8.
— Paragraph 3:45
⬇︎ 下载 PNG 𝕏 分享到 X
When there's a big update, Matt will spend five, sometimes 10 minutes talking about a huge update... he only talked about it for 1 minute.
— Paragraph 2:46
⬇︎ 下载 PNG 𝕏 分享到 X

#AI Models#Claude#GPT-5.5#Codex#SWEBench