DeepSWE’s Score on Opus 4.8 Is Out: Stronger Than 4.7, Lower Cost, Higher Efficiency — But Still Far Behind GPT-5.5. I Haven’t Used It Deeply Yet. I’m Still Using 4.6 Just Because It’s Cheaper.

TL;DR · AI Summary
DeepSWE’s evaluation shows Opus 4.8 outperforms 4.7 in performance, cost, and efficiency, yet still lags far behind GPT-5.5; the author continues using cheaper 4.6 without deep testing of 4.8 or 5.5, and expresses skepticism toward benchmarks, preferring real user feedback from social media.
Key Takeaways
- Opus 4.8 surpasses 4.7 in performance, cost-efficiency, and speed, but remains s
- The author still uses Opus 4.6 solely due to its lower price, having not yet dee
- The author has become skeptical of standardized benchmarks, favoring real-world
Outline
Jump quickly between sections.
DeepSWE finds Opus 4.8 superior to 4.7 in performance, cost, and inference speed, yet still substantially behind GPT-5.5.
Despite knowing 4.8 is better, the author sticks with 4.6 for purely economic reasons, illustrating cost-driven decision-making.
The author expresses disillusionment with standard benchmarks, suggesting real user feedback on social platforms holds more value than formal tests.
Mindmap
See how the topics connect at a glance.
查看大纲文本(无障碍 / 无 JS 友好)
- Opus 大模型版本对比与选型实践
- 性能比较
- Opus 4.8 > 4.7(DeepSWE 评测)
- Opus 4.8 << GPT-5.5(显著差距)
- 经济性考量
- Opus 4.6 成本最低,被持续使用
- 4.8 虽性价比提升,仍未替代 4.6
- 评估方式演变
- Benchmark 祛魅:可信度下降
- 真实用户反馈:优先级上升
Highlights
Key sentences worth saving and sharing.
Opus 4.8 outperforms 4.7 in performance, cost, and efficiency—but still falls far short of GPT-5.5.
I’m still using 4.6—no other reason than it’s cheaper—highlighting price as a decisive factor even when performance improves.
I’ve started to ‘de-mythologize’ benchmarks—I’d rather read real user comments on X than trust benchmark scores.
Viking on X: "DeepSWE’s evaluation of Opus 4.8 is out—it surpasses 4.7, with lower cost and higher efficiency, yet still lags significantly behind GPT-5.5. I haven’t used it extensively myself. In fact, I’m still using version 4.6—no particular reason other than it’s cheaper. Moreover, I’ve somewhat lost faith in benchmarks; real user reviews on X are more informative to me. In any case, I believe GPT-5.5 is definitely the strongest model for most people." https://t.co/7qtafwUnmf / X
Don’t miss what’s happening

Show translation
DeepSWE’s evaluation of Opus 4.8 is out—it surpasses 4.7, with lower cost and higher efficiency, yet still lags significantly behind GPT-5.5. I haven’t used it extensively myself. In fact, I’m still using version 4.6—no particular reason other than it’s cheaper. Moreover, I’ve somewhat lost faith in benchmarks; real user reviews on X are more informative to me. In any case, I believe GPT-5.5 is definitely the strongest model for most people.
·
32
8
2
Read 32 replies