DeepSWE 关于 Opus 4.8 的评分来了，强于 4.7 ，而且成本更低，效率更高，但是仍然落后 GPT5.5 很多，我还没有深度使用。甚至我还在用 4.6，没别的原因，就是便宜。

Viking(@vikingmute)

Viking(@vikingmute)2026年5月31日

DeepSWE’s Score on Opus 4.8 Is Out: Stronger Than 4.7, Lower Cost, Higher Efficiency — But Still Far Behind GPT-5.5. I Haven’t Used It Deeply Yet. I’m Still Using 4.6 Just Because It’s Cheaper.

5.0Score

TL;DR · AI Summary

DeepSWE’s evaluation shows Opus 4.8 outperforms 4.7 in performance, cost, and efficiency, yet still lags far behind GPT-5.5; the author continues using cheaper 4.6 without deep testing of 4.8 or 5.5, and expresses skepticism toward benchmarks, preferring real user feedback from social media.

Key Takeaways

Opus 4.8 surpasses 4.7 in performance, cost-efficiency, and speed, but remains s
The author still uses Opus 4.6 solely due to its lower price, having not yet dee
The author has become skeptical of standardized benchmarks, favoring real-world

Outline

Jump quickly between sections.

§Core Evaluation Findings
DeepSWE finds Opus 4.8 superior to 4.7 in performance, cost, and inference speed, yet still substantially behind GPT-5.5.
§User Adoption Behavior
Despite knowing 4.8 is better, the author sticks with 4.6 for purely economic reasons, illustrating cost-driven decision-making.
§Reflection on Evaluation Methods
The author expresses disillusionment with standard benchmarks, suggesting real user feedback on social platforms holds more value than formal tests.

Mindmap

See how the topics connect at a glance.

查看大纲文本（无障碍 / 无 JS 友好）

Opus 大模型版本对比与选型实践
- 性能比较
  - Opus 4.8 > 4.7（DeepSWE 评测）
  - Opus 4.8 << GPT-5.5（显著差距）
- 经济性考量
  - Opus 4.6 成本最低，被持续使用
  - 4.8 虽性价比提升，仍未替代 4.6
- 评估方式演变
  - Benchmark 祛魅：可信度下降
  - 真实用户反馈：优先级上升

Highlights

Key sentences worth saving and sharing.

Opus 4.8 outperforms 4.7 in performance, cost, and efficiency—but still falls far short of GPT-5.5.
— Paragraph 1
⬇︎ 下载 PNG 𝕏 分享到 X
I’m still using 4.6—no other reason than it’s cheaper—highlighting price as a decisive factor even when performance improves.
— Paragraph 1
⬇︎ 下载 PNG 𝕏 分享到 X
I’ve started to ‘de-mythologize’ benchmarks—I’d rather read real user comments on X than trust benchmark scores.
— Paragraph 1
⬇︎ 下载 PNG 𝕏 分享到 X

#Large Language Model#Benchmark#Opus#GPT-5.5#Cost-Efficiency

Open original article

Viking on X: "DeepSWE’s evaluation of Opus 4.8 is out—it surpasses 4.7, with lower cost and higher efficiency, yet still lags significantly behind GPT-5.5. I haven’t used it extensively myself. In fact, I’m still using version 4.6—no particular reason other than it’s cheaper. Moreover, I’ve somewhat lost faith in benchmarks; real user reviews on X are more informative to me. In any case, I believe GPT-5.5 is definitely the strongest model for most people." https://t.co/7qtafwUnmf / X

Don’t miss what’s happening

Viking

@vikingmute

Show translation

DeepSWE’s evaluation of Opus 4.8 is out—it surpasses 4.7, with lower cost and higher efficiency, yet still lags significantly behind GPT-5.5. I haven’t used it extensively myself. In fact, I’m still using version 4.6—no particular reason other than it’s cheaper. Moreover, I’ve somewhat lost faith in benchmarks; real user reviews on X are more informative to me. In any case, I believe GPT-5.5 is definitely the strongest model for most people.

1:11 PM · May 31, 2026

·

4,799 Views

32

8

2

Read 32 replies