We comprehensively benchmarked Opus 4.7 on document understanding. We evaluated it through ParseBen...

- Opus 4.7图表理解准确率从13.5%提升至55.8%,进步显著
- 在内容忠实度上优于所有对比方案,包括Gemini 3 Flash
- 作为OCR方案成本过高(7美分/页),远高于作者团队的低成本方案(0.4美分/页)
We evaluated it through ParseBench - our comprehensive OCR benchmark for enterprise documents where we evaluate tables, text, charts, and visual grounding.
The results 🧑🔬:
- Opus 4.7 is a general improvement https://t.co/nqJfxSlXHp" / X
Post
Conversation
We comprehensively benchmarked Opus 4.7 on document understanding. We evaluated it through ParseBench - our comprehensive OCR benchmark for enterprise documents where we evaluate tables, text, charts, and visual grounding. The results !Image 1: 🧑🔬: - Opus 4.7 is a general improvement over Opus 4.6. It has gotten much better at charts compared to the previous iteration - Opus 4.7 is quite good at tables, though not quite as good as Gemini 3 flash - Opus 4.7 wins on content faithfulness across all techniques (including ours) - Using Opus 4.7 as an OCR solution is expensive at ~7c per page!! For comparison, our agentic mode is 1.25c and cost-effective is ~0.4c by default. Take a look at these results and more on ParseBench! parsebench.ai

Quote

LlamaIndex !Image 4: 🦙
@llama_index
Apr 16
Anthropic says Opus 4.7 hits 80.6% on Document Reasoning — up from 57.1%. But "reasoning about documents" ≠ "parsing documents for agents." We ran it on ParseBench. → Charts: 13.5% → 55.8% (+42.3) — huge → Formatting: 64.2% → 69.4% (+5.2) → Content: 89.7% → 90.3%
