---
title: "Let's talk document formatting.\n\nBold. Italics. Superscripts. Strikethroughs. The visual cues humans..."
source_name: "LlamaIndex 🦙(@llama_index)"
original_url: "https://x.com/llama_index/status/2049139409316946011"
canonical_url: "https://www.traeai.com/articles/6f726fa8-a986-48fa-b5ce-be4b2d62c3cf"
content_type: "tweet"
language: "中文"
score: 7.8
tags: ["OCR","AI Agent","Document Understanding","LlamaIndex","Benchmark"]
published_at: "2026-04-28T14:51:21+00:00"
created_at: "2026-05-02T11:16:22.585412+00:00"
---

# Let's talk document formatting.

Bold. Italics. Superscripts. Strikethroughs. The visual cues humans...

Canonical URL: https://www.traeai.com/articles/6f726fa8-a986-48fa-b5ce-be4b2d62c3cf
Original source: https://x.com/llama_index/status/2049139409316946011

## Summary

LlamaIndex 发布 ParseBench——首个面向 AI 代理的文档 OCR 基准，首次将语义格式（加粗/删除线/上标等）纳入评估，强调视觉格式即语义。

## Key Takeaways

- 现有 OCR 基准完全忽略加粗、删除线、上标等人类依赖的语义格式线索
- ‘$199’删除线+‘$149’并列不是装饰，而是价格对比的核心语义
- ParseBench 引入 Semantic Formatting Score，是首个专为 AI 代理理解文档设计的 OCR 基准

## Content

Title: LlamaIndex 🦙 on X: "Let's talk document formatting.

Bold. Italics. Superscripts. Strikethroughs. The visual cues humans rely on every time we read a doc, and ones existing OCR benchmarks completely ignore.

😱"$199" struck through next to "$149" isn't decoration. It's the meaning. 
😱A superscript https://t.co/BPhfuZiu9z" / X

URL Source: http://x.com/llama_index/status/2049139409316946011

Markdown Content:
## Post

## Conversation

[![Image 1: Square profile picture](https://pbs.twimg.com/profile_images/1967920417760251904/0ytfduMQ_normal.png)](https://x.com/llama_index)

[LlamaIndex ![Image 2: 🦙](https://abs.twimg.com/emoji/v2/svg/1f999.svg)](https://x.com/llama_index)

[@llama_index](https://x.com/llama_index)

Let's talk document formatting. Bold. Italics. Superscripts. Strikethroughs. The visual cues humans rely on every time we read a doc, and ones existing OCR benchmarks completely ignore. ![Image 3: 😱](https://abs.twimg.com/emoji/v2/svg/1f631.svg)"$199" struck through next to "$149" isn't decoration. It's the meaning. ![Image 4: 😱](https://abs.twimg.com/emoji/v2/svg/1f631.svg)A superscript tells your agent "3" is a citation, not part of the number. Flatten that and your agent is reading a different doc than you are. Two weeks ago we released ParseBench, the first document OCR benchmark for AI agents. One of five metrics: the Semantic Formatting Score. Read more![Image 5: 👇](https://abs.twimg.com/emoji/v2/svg/1f447.svg)[llamaindex.ai/blog/parsebenc](https://t.co/2sq5ncGiel)

![Image 6](https://pbs.twimg.com/amplify_video_thumb/2049138514848280576/img/PrCYyrLI0Q_LCZzZ.jpg)

[Last edited Opens edit history 2:51 PM · Apr 28, 2026](https://x.com/llama_index/status/2049139409316946011/history)

[6,387 Views](https://x.com/llama_index/status/2049139409316946011/analytics)
