Ship Real Agents: Hands-On Evals for Agentic Applications — Laurie Voss, Arize

AI Engineer

AI EngineerVideo2026年5月14日

Ship Real Agents: Hands-On Evals for Agentic Applications — Laurie Voss, Arize

8.5Score

Watchable video resourceOpen original video

TL;DR · AI Summary

This article introduces how to evaluate AI agent systems, including setting up tracing, analyzing data, writing different types of evaluation methods, and meta-evaluation.

Key Takeaways

Need to capture raw data through tracing to run evaluations
Three types of evaluations can be used: code evaluation, LLM evaluation, and cus
Meta-evaluation is used to verify the accuracy of evaluators' judgments

Outline

Jump quickly between sections.

§Introduction
Introduce speaker Laurie Voss and her experience in AI evaluation.
·Importance of Evaluation
Explain why evaluation is more complex for AI agent systems than simple LLM calls.
·Tracing Setup
Explain how to capture raw data needed for evaluations through tracing.
·Evaluation Types
Introduce three evaluation methods: code evaluation, LLM evaluation, and custom evaluation.
·Meta-Evaluation
Describe how to verify the accuracy of evaluators' judgments.

Mindmap

See how the topics connect at a glance.

查看大纲文本（无障碍 / 无 JS 友好）

AI代理系统评估
- 评估重要性
  - 比简单LLM调用更复杂
- 追踪设置
  - 捕获原始数据
- 评估类型
  - 代码评估
  - LLM评估
  - 自定义评估
- 元评估
  - 验证评估者判断

Highlights

Key sentences worth saving and sharing.

Need to capture raw data through tracing to run evaluations
— Paragraph 1
⬇︎ 下载 PNG 𝕏 分享到 X
Three types of evaluations can be used: code evaluation, LLM evaluation, and custom evaluation
— Paragraph 2
⬇︎ 下载 PNG 𝕏 分享到 X
Meta-evaluation is used to verify the accuracy of evaluators' judgments
— Paragraph 3
⬇︎ 下载 PNG 𝕏 分享到 X

#AI Evaluation#Agent Systems#LLM