Jan Leike 在 X 上谈 AI 对齐研究的十年演进

Jan Leike(@janleike)

Jan Leike(@janleike)2026年5月8日

Jan Leike on X: The Evolution of AI Alignment Research Over a Decade

7.5Score

TL;DR · AI Summary

Jan Leike reflects on the transformation of AI alignment research over the past decade—from a niche field with only ~12 researchers and unclear methods to one now driven by RLHF, scalable oversight, and automated techniques like constitutional AI in models such as Claude.

Key Takeaways

In the early days (~10 years ago), only about 12 people worked on alignment, mos
RLHF made alignment practically feasible for large language models, enabling rea
Current alignment research is increasingly automated—e.g., Claude now includes a

Outline

Jump quickly between sections.

§Introduction: Early State of Alignment Research
The author notes that ten years ago, there was little understanding of how AGI would be built or secured, and the field had very few dedicated researchers.
·Early Challenges: Sparse Talent and Unclear Approaches
At that time, only around a dozen researchers were involved in alignment work, mostly part-time, with no consensus on how to tackle the problem.
·Key Turning Point: RLHF Makes Alignment Practical
Reinforcement Learning from Human Feedback (RLHF) turned alignment into an actionable technique, significantly boosting its real-world applicability.
·Recent Progress: Automation and Systematic Improvements
Today’s alignment research is becoming increasingly automated, including behavior evaluation, correction mechanisms, and constitutional design in models like Claude.

Mindmap

See how the topics connect at a glance.

查看大纲文本（无障碍 / 无 JS 友好）

AI 对齐研究的发展历程
- 早期阶段（<10年前）
  - 人数极少（~12人）
  - 无明确方法论
- 中期突破（2010s末-2020s初）
  - RLHF 技术成熟
  - 实验机会增加
- 当前趋势（2020s至今）
  - 自动化对齐研究
  - 模型内置安全机制（如Claude宪法）

Highlights

Key sentences worth saving and sharing.

When I started to work on the alignment problem more than 10 years ago, we had no idea how AGI was going to be built or how to make it safe.
— Paragraph 1
⬇︎ 下载 PNG 𝕏 分享到 X
The field had maybe a dozen people who were working on it as a side gig. Everyone was pretty confused about how to approach the problem.
— Paragraph 1
⬇︎ 下载 PNG 𝕏 分享到 X
RLHF on LLMs made it a lot more practical. We've made a ton of progress on evaluating, investigating, steering and fixing behavioral issues.
— Paragraph 2
⬇︎ 下载 PNG 𝕏 分享到 X

#AI Alignment#AGI#RLHF#Machine Learning

Open original article

Jan Leike on X: "When I started to work on the alignment problem more than 10 years ago, we had no idea how AGI was going to be built or how to make it safe. The field had maybe a dozen people who were working on it as a side gig. Everyone was pretty confused about how to approach the problem," / X

Don’t miss what’s happening

Jan Leike

@janleike

When I started to work on the alignment problem more than 10 years ago, we had no idea how AGI was going to be built or how to make it safe. The field had maybe a dozen people who were working on it as a side gig. Everyone was pretty confused about how to approach the problem, and the number of people willing to run experiments with deep learning was tiny. So much has changed since then! The world woke up not just to AGI but also increasingly to the importance of alignment. RLHF on LLMs made it a lot more practical. We've made a ton of progress on evaluating, investigating, steering and fixing behavioral issues. Claude now has a constitution and we made some good progress on scalable oversight. More and more of our alignment research is getting automated.

5:48 PM · May 8, 2026

·

10.3K Views

4

1

114

5