返回首页
LlamaIndex 🦙(@llama_index)

Anthropic says Opus 4.7 hits 80.6% on Document Reasoning — up from 57.1%. But "reasoning about docu...

7.2Score
Anthropic says Opus 4.7 hits 80.6% on Document Reasoning — up from 57.1%.

But "reasoning about docu...
AI 深度提炼
  • Opus 4.7在文档推理基准得分从57.1%提升至80.6%,但不等同于实际解析能力
  • 在ParseBench测试中,Opus 4.7对图表识别提升显著(+42.3%),但布局理解反而下降
  • LlamaParse Agentic整体达84.9%,成本约1.2¢/页,更适合企业级文档解析场景
#大模型#文档解析#Anthropic#LlamaIndex#AI评估
打开原文

But "reasoning about documents" ≠ "parsing documents for agents."

We ran it on ParseBench.

→ Charts: 13.5% → 55.8% (+42.3) — huge → Formatting: 64.2% → 69.4% (+5.2) → Content: 89.7% → 90.3% https://t.co/cyo4QWVsS0" / X

Don’t miss what’s happening

People on X are the first to know.

Post

Conversation

![Image 1: Square profile picture](https://x.com/llama_index)

Anthropic says Opus 4.7 hits 80.6% on Document Reasoning — up from 57.1%. But "reasoning about documents" ≠ "parsing documents for agents." We ran it on ParseBench. → Charts: 13.5% → 55.8% (+42.3) — huge → Formatting: 64.2% → 69.4% (+5.2) → Content: 89.7% → 90.3% (+0.6) → Tables: 86.5% → 87.2% (+0.7) → Layout: 16.5% → 14.0% (-2.5) — regressed Real chart gains, but at ~1.5¢/page. Enterprise scale? Not yet. LlamaParse Agentic: 84.9% overall. ~1.2¢/page. The frontier for general document understanding is long. No single model solves it. → github.com/run-llama/Pars

![Image 2: Image](https://x.com/llama_index/status/2044886527352647859/photo/1)

New to X?

Sign up now to get your own personalized timeline!

Trending now

What’s happening

Euphoria · Trending

#euphoria

Sports · Trending

Buffalo

Sports · Trending

Wemby

Sports · Trending

Logan Cooley