Jerry Liu(@jerryjliu0)2026年4月16日

We comprehensively benchmarked Opus 4.7 on document understanding. We evaluated it through ParseBen...

7.8Score

AI 深度提炼

Opus 4.7图表理解准确率从13.5%提升至55.8%，进步显著
在内容忠实度上优于所有对比方案，包括Gemini 3 Flash
作为OCR方案成本过高（7美分/页），远高于作者团队的低成本方案（0.4美分/页）

#Opus#文档理解#OCR#ParseBench#AI模型评估

We evaluated it through ParseBench - our comprehensive OCR benchmark for enterprise documents where we evaluate tables, text, charts, and visual grounding.

The results 🧑‍🔬:

Opus 4.7 is a general improvement https://t.co/nqJfxSlXHp" / X

Post

Conversation

We comprehensively benchmarked Opus 4.7 on document understanding. We evaluated it through ParseBench - our comprehensive OCR benchmark for enterprise documents where we evaluate tables, text, charts, and visual grounding. The results !Image 1: 🧑‍🔬: - Opus 4.7 is a general improvement over Opus 4.6. It has gotten much better at charts compared to the previous iteration - Opus 4.7 is quite good at tables, though not quite as good as Gemini 3 flash - Opus 4.7 wins on content faithfulness across all techniques (including ours) - Using Opus 4.7 as an OCR solution is expensive at ~7c per page!! For comparison, our agentic mode is 1.25c and cost-effective is ~0.4c by default. Take a look at these results and more on ParseBench! parsebench.ai

![Image 2: Image](https://x.com/jerryjliu0/status/2044902620746363016/photo/1)

Quote

LlamaIndex !Image 4: 🦙

@llama_index

Apr 16

Anthropic says Opus 4.7 hits 80.6% on Document Reasoning — up from 57.1%. But "reasoning about documents" ≠ "parsing documents for agents." We ran it on ParseBench. → Charts: 13.5% → 55.8% (+42.3) — huge → Formatting: 64.2% → 69.4% (+5.2) → Content: 89.7% → 90.3%

![Image 5: Image](https://x.com/llama_index/status/2044886527352647859/photo/1)