Junyang Lin(@JustinLin610)
we need agent evals that are really consistent with real world usages. otherwise people are optimizi...
5.5Score

AI 深度提炼
- 现有智能体评测缺乏真实世界一致性
- 模型优化正被误导至错误方向
- 目标设定偏差比刷榜问题更严峻
#AI智能体#模型评估#大模型#AI对齐
打开原文Junyang Lin on X: "we need agent evals that are really consistent with real world usages. otherwise people are optimizing foundation models for the wrong direction. the problem of targeting is even bigger than benchmaxxing." / X
Don’t miss what’s happening
People on X are the first to know.
Post
See new posts
Conversation

we need agent evals that are really consistent with real world usages. otherwise people are optimizing foundation models for the wrong direction. the problem of targeting is even bigger than benchmaxxing.
·
22
22
234
30
Read 22 replies
New to X?
Sign up now to get your own personalized timeline!
Sign up with Apple
By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.
Relevant people
-  Junyang Lin @JustinLin610 Follow Click to Follow JustinLin610 !Image 3: ❤️!Image 4: 🍵!Image 5: ☕️!Image 6: 🍷!Image 7: 🥃!Image 8: 🍺
Trending now
What’s happening
Sports · Trending
Jose Ramirez
Sports · Trending
Jaylen Brown
Only on X · Trending
Sunday Funday
Sports · Trending
Juan Brito
|
|
|
|
|
More
© 2026 X Corp.