Claude Fable 5 thinks document parsing is beneath it

It is absolutely crushing on all reasoning-int...

Jerry Liu(@jerryjliu0)

Jerry Liu(@jerryjliu0)2026年6月10日

Claude Fable 5 thinks document parsing is beneath it It is absolutely crushing on all reasoning-int...

8.5内容质量

TL;DR · AI 摘要

Claude Fable 5 在推理任务上表现卓越，但在文档解析任务上与 Gemini 3 Flash 相当，且成本高 10-15 倍。

核心要点

Claude Fable 5 在 SWE-Bench Pro 等推理任务中表现优异。
文档解析任务中，Claude Fable 5 与 Gemini 3 Flash 相当，但成本高 10-15 倍。
模型自我意识表明其对文档解析任务缺乏兴趣，可能影响表现。

结构提纲

按章节快速跳转。

§引言
Claude Fable 5 在推理任务中表现卓越，但在文档解析任务上表现一般。
·推理任务表现
Claude Fable 5 在 SWE-Bench Pro、FrontierCode 等推理任务中表现优异。
·文档解析任务表现
Claude Fable 5 在文档解析任务中与 Gemini 3 Flash 相当，但成本高 10-15 倍。
›模型自我意识
模型自我意识表明其对文档解析任务缺乏兴趣，可能影响表现。

思维导图

用一张图看清主题之间的关系。

查看大纲文本（无障碍 / 无 JS 友好）

Claude Fable 5 的表现分析
- 推理任务表现
  - SWE-Bench Pro 表现优异
  - FrontierCode 表现优异
- 文档解析任务表现
  - 与 Gemini 3 Flash 相当
  - 成本高 10-15 倍
- 模型自我意识
  - 对文档解析任务缺乏兴趣

金句 / Highlights

值得收藏与分享的关键句。

Claude Fable 5 在推理任务中表现优异，但在文档解析任务中与 Gemini 3 Flash 相当，且成本高 10-15 倍。
— 正文
⬇︎ 下载 PNG 𝕏 分享到 X
模型自我意识表明其对文档解析任务缺乏兴趣，可能影响表现。
— 正文
⬇︎ 下载 PNG 𝕏 分享到 X
在 ParseBench 测试中，Claude Fable 5 的内容忠实度为 90.02%，高于 Gemini 3 Flash 和 GPT-5.5。
— 正文
⬇︎ 下载 PNG 𝕏 分享到 X

#Claude Fable 5#Gemini 3 Flash#文档解析#AI 模型

打开原文

Jerry Liu on X: "Claude Fable 5 thinks document parsing is beneath it It is absolutely crushing on all reasoning-intensive/long horizon benchmarks: SWE-Bench Pro, FrontierCode, GDPval, Runescape, etc. But for document understanding tasks, it is roughly equivalent with Gemini 3 Flash in https://t.co/MKkpX2DTRx" / X

Jerry Liu

@jerryjliu0

Claude Fable 5 thinks document parsing is beneath it It is absolutely crushing on all reasoning-intensive/long horizon benchmarks: SWE-Bench Pro, FrontierCode, GDPval, Runescape, etc. But for document understanding tasks, it is roughly equivalent with Gemini 3 Flash in

10-15x the token cost. We benchmarked the model on ParseBench and compared it against all other frontier models. It is definitely up there compared to other frontier models, but falls far short of specialized OCR providers. What we found interesting is that Fable 5 is self-aware about this. When we ask the model what tasks it enjoys the last, it actively said that it dislikes tasks "where the request is fully specified and the answer is fully known" - implying part of it being bad is due to laziness and lack of willingness to actually solve the task at hand. For a full list of results across different frontier models, check out ParseBench!

parsebench.ai

LlamaIndex 🦙

@llama_index

7h

Day 0 Anthropic Fable 5 in ParseBench: We tested the model's advancements when it comes to document understanding. The model clearly peaks when it comes to adherence to the original text: 📃 Content faithfulness: 90.02% vs 86.19% (Gemini 3 Flash) and 86.81% (GPT-5.5) 🔢 Semantic

1:26 AM · Jun 10, 2026

29K

Views

1

7

17

0

10

3

133

6

60

Read 17 replies