返回首页
Jerry Liu(@jerryjliu0)

A downside with using VLMs to parse PDFs is guaranteeing that the output text is *correct* and outpu...

7.5Score
A downside with using VLMs to parse PDFs is guaranteeing that the output text is *correct* and outpu...
AI 深度提炼
  • VLM解析PDF易出现文本幻觉或遗漏,影响下游决策
  • 复杂版面的阅读顺序线性化仍是技术难点
  • ParseBench用16.7万条规则评估内容忠实度
#VLM#PDF解析#OCR#AI Agent#LlamaIndex
打开原文

1️⃣ Text correctness: making sure that digits, words, sentences are not hallucinated or dropped. 2️⃣ Reading Order: making sure that complex https://t.co/IFZXPZ37sb" / X

Post

Conversation

A downside with using VLMs to parse PDFs is guaranteeing that the output text is *correct* and output in the correct reading order. !Image 1: 1️⃣ Text correctness: making sure that digits, words, sentences are not hallucinated or dropped. !Image 2: 2️⃣ Reading Order: making sure that complex multi-layout pages are linearized into the right 1-d text order. We call this Content Faithfulness in ParseBench, our comprehensive document OCR benchmark for agents. We have 167k rules that measure digit/word/sentence-level correctness along with reading order correctness. It seems relatively table-stakes, but no parser gets this 100% right, and this means that the agent’s downstream decision-making is compromised. Come learn more about how this metric works in the video below, along with our full blog writeup, whitepaper, and website! Blog: llamaindex.ai/blog/parsebenc Paper: arxiv.org/abs/2604.08538 Website: parsebench.ai/?utm_medium=so

1:48

Quote

Image 3: Square profile picture

LlamaIndex !Image 4: 🦙

@llama_index

Apr 17

Let's talk content faithfulness. Four days ago, we launched ParseBench, the first document OCR benchmark for AI agents. Its most fundamental metric asks: did the parser capture all the text, in order, without making things up? We grade three failure modes with 167K+ rule-based

Image 5