概念

LLM-as-a-judge

Q: LLM-as-a-judge 最近有什么新动态？

traeai 已收录 1 篇与 LLM-as-a-judge 相关的内容。最新一篇是「Presentation: Powering the Future: Building Your GenAI Infrastructure Stack」，由 InfoQ 发布。

使用大语言模型自动评估AI Agent输出质量的方法论，解决人工评估的规模化问题。

已跟踪 1 条高相关材料

TraeAI 观察

如果只读 3 篇

Presentation: Powering the Future: Building Your GenAI Infrastructure Stack

InfoQ · 8.2 分

Intuit通过GenOS平台和"fixed, flexible, free"框架支撑8000+开发者完成3500+生产实验，实现AI Agent规模化落地，关键实践包括LLM-as-a-judge评估策略和面向Agent的API设计原则。

Presentation: Powering the Future: Building Your GenAI Infrastructure Stack

InfoQ5月20日7951 字 (约 32 分钟)

Intuit scaled GenAI development across 8,000+ developers with 3,500+ production experiments using the GenOS platform and 'fixed, flexible, free' framework, featuring LLM-as-a-judge evaluation and Agent-friendly API design.

入选理由：Intuit采用"fixed, flexible, free"三层框架设计GenOS平台，fixed层提供标准化基础设施，flexible层支持业务定制，free层鼓励创新实验

FeaturedArticle#AI Agent#GenAI Infrastructure#Intuit#LLM Evaluation#Platform Engineering英文

跨材料问答 · LLM-as-a-judge

回答基于：LLM-as-a-judge 相关 1 条材料