T
traeai
Sign in
返回首页
Viking(@vikingmute)

DeepSWE’s Score on Opus 4.8 Is Out: Stronger Than 4.7, Lower Cost, Higher Efficiency — But Still Far Behind GPT-5.5. I Haven’t Used It Deeply Yet. I’m Still Using 4.6 Just Because It’s Cheaper.

5.0Score
DeepSWE’s Score on Opus 4.8 Is Out: Stronger Than 4.7, Lower Cost, Higher Efficiency — But Still Far Behind GPT-5.5. I Haven’t Used It Deeply Yet. I’m Still Using 4.6 Just Because It’s Cheaper.

TL;DR · AI Summary

DeepSWE’s evaluation shows Opus 4.8 outperforms 4.7 in performance, cost, and efficiency, yet still lags far behind GPT-5.5; the author continues using cheaper 4.6 without deep testing of 4.8 or 5.5, and expresses skepticism toward benchmarks, preferring real user feedback from social media.

Key Takeaways

  • Opus 4.8 surpasses 4.7 in performance, cost-efficiency, and speed, but remains s
  • The author still uses Opus 4.6 solely due to its lower price, having not yet dee
  • The author has become skeptical of standardized benchmarks, favoring real-world

Outline

Jump quickly between sections.

  1. DeepSWE finds Opus 4.8 superior to 4.7 in performance, cost, and inference speed, yet still substantially behind GPT-5.5.

  2. Despite knowing 4.8 is better, the author sticks with 4.6 for purely economic reasons, illustrating cost-driven decision-making.

  3. The author expresses disillusionment with standard benchmarks, suggesting real user feedback on social platforms holds more value than formal tests.

Mindmap

See how the topics connect at a glance.

查看大纲文本(无障碍 / 无 JS 友好)
  • Opus 大模型版本对比与选型实践
    • 性能比较
      • Opus 4.8 > 4.7(DeepSWE 评测)
      • Opus 4.8 << GPT-5.5(显著差距)
    • 经济性考量
      • Opus 4.6 成本最低,被持续使用
      • 4.8 虽性价比提升,仍未替代 4.6
    • 评估方式演变
      • Benchmark 祛魅:可信度下降
      • 真实用户反馈:优先级上升

Highlights

Key sentences worth saving and sharing.

  • Opus 4.8 outperforms 4.7 in performance, cost, and efficiency—but still falls far short of GPT-5.5.

    Paragraph 1

    ⬇︎ 下载 PNG𝕏 分享到 X
  • I’m still using 4.6—no other reason than it’s cheaper—highlighting price as a decisive factor even when performance improves.

    Paragraph 1

    ⬇︎ 下载 PNG𝕏 分享到 X
  • I’ve started to ‘de-mythologize’ benchmarks—I’d rather read real user comments on X than trust benchmark scores.

    Paragraph 1

    ⬇︎ 下载 PNG𝕏 分享到 X
#Large Language Model#Benchmark#Opus#GPT-5.5#Cost-Efficiency
Open original article

Viking on X: "DeepSWE’s evaluation of Opus 4.8 is out—it surpasses 4.7, with lower cost and higher efficiency, yet still lags significantly behind GPT-5.5. I haven’t used it extensively myself. In fact, I’m still using version 4.6—no particular reason other than it’s cheaper. Moreover, I’ve somewhat lost faith in benchmarks; real user reviews on X are more informative to me. In any case, I believe GPT-5.5 is definitely the strongest model for most people." https://t.co/7qtafwUnmf / X

Don’t miss what’s happening

Image 1

Viking

@vikingmute

Show translation

DeepSWE’s evaluation of Opus 4.8 is out—it surpasses 4.7, with lower cost and higher efficiency, yet still lags significantly behind GPT-5.5. I haven’t used it extensively myself. In fact, I’m still using version 4.6—no particular reason other than it’s cheaper. Moreover, I’ve somewhat lost faith in benchmarks; real user reviews on X are more informative to me. In any case, I believe GPT-5.5 is definitely the strongest model for most people.

Image 2: Image

1:11 PM · May 31, 2026

·

4,799 Views

32

8

2

Read 32 replies

AI may generate inaccurate information. Please verify important content.