返回首页
Junyang Lin(@JustinLin610)

we need agent evals that are really consistent with real world usages. otherwise people are optimizi...

5.5Score
we need agent evals that are really consistent with real world usages. otherwise people are optimizi...
AI 深度提炼
  • 现有智能体评测缺乏真实世界一致性
  • 模型优化正被误导至错误方向
  • 目标设定偏差比刷榜问题更严峻
#AI智能体#模型评估#大模型#AI对齐
打开原文

Junyang Lin on X: "we need agent evals that are really consistent with real world usages. otherwise people are optimizing foundation models for the wrong direction. the problem of targeting is even bigger than benchmaxxing." / X

Don’t miss what’s happening

People on X are the first to know.

Log in

Sign up

Post

See new posts

Conversation

![Image 1](http://x.com/JustinLin610)

Junyang Lin

@JustinLin610

we need agent evals that are really consistent with real world usages. otherwise people are optimizing foundation models for the wrong direction. the problem of targeting is even bigger than benchmaxxing.

12:36 AM · Apr 12, 2026

·

23.9K Views

22

22

234

30

Read 22 replies

New to X?

Sign up now to get your own personalized timeline!

Sign up with Apple

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Relevant people

Trending now

What’s happening

Sports · Trending

Jose Ramirez

Sports · Trending

Jaylen Brown

Only on X · Trending

Sunday Funday

Sports · Trending

Juan Brito

Show more

Terms of Service

|

Privacy Policy

|

Cookie Policy

|

Accessibility

|

Ads info

|

More

© 2026 X Corp.