Claude Opus 4.8 全面解析与实测(实用AI资讯)
TL;DR · AI 摘要
Claude Opus 4.8是Anthropic对4.7版的快速修正,重点提升对模糊指令的理解能力以回归4.6的“用户友好”风格;虽在官方基准测试中表现优于GPT-4.5,但真实世界工程基准DeepSWE显示GPT-4.5当前更胜一筹,且4.8尚未参与该测试。
核心要点
- Opus 4.8通过增强歧义理解能力修正了4.7过度字面化的问题,目标是恢复4.6版本广受好评的‘vibes’体验。
- DeepSWE真实场景软件工程基准中,GPT-4.5目前领先(4.8尚未加入测试),表明官方benchmark可能有选择性偏差。
- 动态工作流功能可生成数百子智能体完成复杂任务,但会显著消耗云账户配额,需谨慎使用。
结构提纲
按章节快速跳转。
思维导图
用一张图看清主题之间的关系。
查看大纲文本(无障碍 / 无 JS 友好)
- Claude Opus 4.8深度解析
- 发布背景
- 快速迭代:4.7争议大,4.6广受好评
- 目标:修复过度字面化,回归用户友好
- 关键技术特性
- 歧义理解增强(vs 4.7)
- 动态工作流:数百子Agent协同
- 高资源消耗:显著占用云配额
- 性能评估
- 官方Benchmark:优于GPT-4.5
- DeepSWE真实基准:GPT-4.5领先(4.8未测)
- 用户Vibes:社区反馈积极
- 竞品对比
- Gemini 3.5 Flash:中高级用户评价低
- GPT-4.5:在Agentic应用中被部分用户认为更优
金句 / Highlights
值得收藏与分享的关键句。
4.7版因‘过于精确、照字面执行’引发用户不满,4.8版明确做了‘course correction’——提升对模糊性的理解,这是4.6版的核心优势。
DeepSWE是一个真实世界软件工程基准,任务从零编写、无训练数据泄露风险,且采用短提示——更贴近实际使用场景;GPT-4.5在此测试中领先,而Opus 4.8尚未参与。
动态工作流功能可自动spawn数百个子agent完成大任务,但会‘eat up most of your usage in your cloud account’,带来显著成本与资源开销。
视频笔记
translation:
YouTube Transcript
language: English ( automatically generated) (en)
[0:00] Claude was already the fan favorite of
[0:02] many people and now they released
[0:04] another model Opus 4.8. Today I just
[0:07] want to talk about this big release
[0:09] along with some of the things that it
[0:11] came with like dynamic workloads where
[0:13] it spawn hundreds of sub agents and gets
[0:16] a big task done and while add it eats up
[0:18] most of your usage in your cloud
[0:20] account. We'll look at that. We'll look
[0:21] at the model and we'll look at consumer
[0:23] preferences and we'll touch on some
[0:25] other stories that appeared this week.
[0:26] But honestly, this is the big hitter
[0:28] this week. This is what we're going to
[0:29] focus on. Let's begin. So yeah,
[0:31] basic new model from Enthropic Opus
[0:35] 4.8. You can see it here available in
[0:37] the webapp. It's available in cloud
[0:39] co work and cloud code. It's available
[0:41] in the API. This was a quickly released
[0:43] since 4.7 that happened recently. And
[0:46] the reason for that was 4.7 was(probability) the model with the most mixed reviews
[0:49] where people were like, I'm not sure
[0:51] this is better than 4.6. If you wasn't
[0:54] following that story, 4.6 6 was the
[0:56] model that made all of this agmic
[1:00] work that made people buy Mac
[1:02] minis that made open work work between
[1:04] Opus 4.5 and 4.6 those models were what
[1:07] made that possible why this whole
[1:09] agmic revolution for customers started
[1:11] moving and 4.7 was better on benches
[1:14] but it was a bit too precise it took
[1:15] everything literal so what they did on
[1:17] 4.8 now was a bit of a course correct.
[1:20] They said, " Okay, it interprets
[1:21] ambiguity a bit better than 4.7, which
[1:24] was a big feature of 4.6." All right, I
[1:26] don't want to spend a lot of time
[1:28] on Benchmarks, and I don't want to spend a lot of time on Benchmarks, but I do want to show them because, you know, they are above and beyond what 4.7 did, and they are also above and beyond what GP 5.5 does in many instances. But then try to make sense of all of these numbers. Just because these are higher doesn't mean it's better. User preferences is usually what matters. People on the internet
[1:46] talk about the vides and so far the vides on 4.8 are immaculate. People are liking it. But here's the thing with these Benchmarks the company showed, they always show the ones that are in their favor. Look, they selected 1 2 3 4 5 6. But not even these Benchmarks tell the full story. And there was a Bench mark that got a lot of attention this week that I want to show you.
[2:01] Deepswe. And basic what they did here is they try to create a real-world software swine sw benchmark where tasks are written from scratch. So the model has no chance of training on those exact tasks. There's a high divuity. There's shorter prompts just like in the real world. People working with these models don't always write these longifiers profits like in the real world. often they kind of just wing it and let the model figure it out and more. And恰好 this does not have Claude_opus 4.8 on here yet. This story got some attention earlier this week before the model came out. This story represents the real world real of these models way more accurately than what these companies put out. So for example, GPD 545, a lot of people claim that once you get used to how to prompt it, how to use it is actually super accurate in these swubic applications than_opus. And it wins out here. Now I]) that 4.8 will land somewhere around here. But rather than focusing on the competition between these two that we've talked about for the past few minutes, I want to kind of highlight Gemini 3.5 Flash, which overall is not very well received by people doing more intermediate to advanced things with AI. sure, they use it in their new Google search and in Gemini products. But evaluation like this seems more accurate than Google's representation, which is like, Hey, we're at the top of this bench mark and it's just the best model and the faster model right now. sure, they'll have the pro version of this coming and we'll have a look at this again, but I just want to show this because I thought like this families more accurate than all these blog posts that we're getting where they kind of toot their own horn.]) now let's talk about what came out with this. Not just that the model is different and we'll do some comparison profits here in a second, but also they introduce dynamic workloads. This is a thing exclusively for enterprise team and max plans where well it spawn like a 100 sub agents and those go out and do something very complex. I've already heard rumors of this, but what I want to do is]) SEEK]) Leone star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star
translated marked down code into Chinese.
[3:54] try in this video is let this thing run
[3:56] and see how much of my weekly usage this
[3:58] eats up because that's the big question,
[4:00] right? Even though I'm on a $200 plan,
[4:02] my suspicion is that this is going to
[4:04] eat into a two-digit percentage of my
[4:06] weekly usage. And by the way, we get
[4:08] offered selection here, which before we
[4:10] didn't. We just got adaptive thinking,
[4:11] right? But you get five levels now. You
[4:13] can go all the way to max. It's going to
[4:15] eat up more usage. But this is real**
[4:17] neat. This is something their
[4:18] customers had. Now they have it, too.
[4:20] Let's just crank this thing. Let's go.
[4:21] Opus 4.8最大同类])**ector. And I do like
[4:25] trying this prompt. I'm not a big fan of
[4:27] a lot of these test prompts where, you
[4:29] know, we're creating 3D games. I do
[4:31] them, but I'm just not the biggest fan.
[4:32] Whereas this one, create a visually
[4:34] stunning design website for a studio
[4:36] that will impress web front end
[4:38] developers. It's kind of very subject
[4:40] and open-ended. And each one of these
[4:42] models has a different kind of taste
[4:44] profile on what it considers to be
[4:46] impressive to a web front end. So,
[4:48] really want to see what we get here in a
[4:49] second and also how long this is going
[4:51] to take. Okay, here's the website. This
[4:53] took over 10 minutes to build. Let's
[4:55] have a look. We make interfaces.
[4:57] Oblique. Ooh. I mean, come on. Have you
[5:00] This is just me personally. Have you
[5:02] even seen an element like this or
[5:03] anything like this with AI one shotting
[5:05] it? This just makes me want to move over
[5:07] it. This is gorgeous. And you know what?
[5:08] I like this didn't work nearly as well
[5:10] with 4.7. With 4.7, it took things
[5:13]DM swInstalledDM sw])** Leone Leone starDM star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star star
[7:52] fault in this application. I tried
[7:53] everything. I re-uploaded data with new
[7:55] numbers. It got added perfectly.
[7:57] Everything populated. The design is
[7:58] right. The mobile version is optimized.
[8:01] That was a step that it did there in the
[8:03] middle, too. That's real excite
[8:05] because with a lot of this AI stuff, you
[8:06] know, it does things for you, but a lot
[8:08] of times it's like 80% 90% of the way
[8:10] there with this feature. sure, it'll run
[8:12] for an hour, but it's actually done. So,
[8:14] I think this might be interesting for a
[8:17] whole people watching this video, and I
[8:18] would actually make this recommendation
[8:19] if you just want to build something from
[8:21] scratch. I mean, Heck, if you do have
[8:23] token limits or you're not using your
[8:24] Claude account on that day, you can kind
[8:26] of just let this workflow feature go. As
[8:29] you saw in my account, I don't use claw
[8:31] desktop regular. I mostly use OpenClaw
[8:33] for my own work or clawed co-work for a
[8:35] lot of workflows that we teach. That's a
[8:37] nice sweet spot. But with this feature
[8:39] and me having the max account for
[8:41] testing purposes altogether, I'll be using
[8:43] this. I'll let this just run. I'll open
[8:45] up an extraapp and I'll let it build
[8:47] things even though it's not a mass])项目 that might require this, but you know, I would rather have it 100% done and go overisor on this option rather than 90% done and then me going back and forth with it and fixing little bugs. I thought this was real excite and I don't know over the next week I guess I'll try it on some more specific and more complex tasks and yeah overall between_opus 4.8 being real solid and this new selection in claude code I think this is anamazing release honestly go try it out now it's available in all claude plans and yeah that's percent what I got to say about claude_opus 4.8 star. And then finally, among all the other stories that were on my rad
[10:28] Hey, new you can use. breaking down the stories you actually care about.
[10:33] Hey new Hey trouble you help you figure it out.
[10:49] Hey new here to help you figure it out.
[10:49] Hey new Hey trouble you help you figure it out.
[10:49] Hey new Hey trouble you help you figure it out.
[10:49] Hey new Hey trouble you help you figure it out.
[10:49] Hey new Hey trouble you help you figure it out.
[10:49] Hey new Hey trouble you help you figure it out.
[10:49] Hey new Hey trouble you help you figure it out.
[10:49] Hey new Hey trouble you help you figure it out.
[10:49] Hey new Hey trouble you help you figure it out.
[10:49] Hey new Hey trouble you help you figure it out.
[10:49] Hey new Hey trouble you help you figure it out.
[10:49] Hey new Hey trouble you help you figure it out.
[10:49] Hey new Hey trouble you help you figure it out.
[10:49] Hey new Hey trouble you help you figure it out.
[10:49] Hey new]) Hey trouble you help you]) trouble]) Hey]) Hey]) Hey])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])])**