人物

Jan Leike

Q: Jan Leike 最近有什么新动态？

traeai 已收录 4 篇与 Jan Leike 相关的内容。最新一篇是「I'm really excited about this as a new tool in our interpretability tool kit」，由 Jan Leike(@janleike) 发布。

别名：janleike

AI 安全研究员，曾任 DeepMind 研究员，现关注 LLM 可解释性与对齐。

已跟踪 4 条高相关材料

TraeAI 观察

如果只读 3 篇

I'm really excited about this as a new tool in our interpretability tool kit

Jan Leike(@janleike) · 8.5 分

NLAs（自然语言适配器）是一种无监督方法，可将大语言模型内部状态转化为人类可读文本，显著提升模型可解释性与安全性审计能力。

When I started to work on the alignment problem more than 10 years ago, we had no idea how AGI was g...

Jan Leike(@janleike) · 7.5 分

Jan Leike 回顾了超过十年来 AI 对齐（alignment）研究的演变：从最初仅十几人参与、方法模糊的边缘领域，发展到如今因 RLHF 和可扩展监督等技术进步而变得实用，并推动了像 Claude 这样的模型具备宪法机制和自动化对齐研究。

Some personal news: I am starting a new research project at Anthropic. Very excited about this! Man...

Jan Leike(@janleike) · 5.5 分

Jan Leike 将在 Anthropic 开始新的研究项目，强调 AGI 成功不仅依赖对齐（alignment），还需其他关键因素。该声明引发社区关注但未提供具体细节。

Jan Leike on X: "I'm really excited about this as a new tool in our interpretability tool kit"

Jan Leike(@janleike)5月9日152 字 (约 1 分钟)

NLAs is an unsupervised method that converts LLM internal states into human-readable text, significantly improving model transparency and safety auditing.

入选理由：NLAs 是一种无监督技术，能将 LLM 内部激活向量转为自然语言描述。

FeaturedTweet#LLM#Interpretability#AI Safety#Anthropic英文

When I started to work on the alignment problem more than 10 years ago, we had no idea how AGI was g...

Jan Leike on X: The Evolution of AI Alignment Research Over a Decade

Jan Leike(@janleike)5月9日292 字 (约 2 分钟)

Jan Leike reflects on the transformation of AI alignment research over the past decade—from a niche field with only ~12 researchers and unclear methods to one now driven by RLHF, scalable oversight, and automated techniques like constitutional AI in models such as Claude.

入选理由：10 年前 AI 对齐领域仅有约 12 人作为副业从事研究，且方法混乱。

FeaturedTweet#AI Alignment#AGI#RLHF#Machine Learning英文

Some personal news: I am starting a new research project at Anthropic. Very excited about this!

Jan Leike(@janleike)5月9日113 字 (约 1 分钟)

Jan Leike is joining Anthropic to start a new research project, emphasizing that alignment is just one of many factors needed for AGI success.

入选理由：Jan Leike 加入 Anthropic 开始新研究项目，聚焦 AGI 发展。

FeaturedTweet#AGI#Anthropic#research英文

Jan Leike on X: Grateful for talented people in AI alignment

Jan Leike(@janleike)5月9日120 字 (约 1 分钟)

Jan Leike thanks talented collaborators in AI alignment, calling it a privilege to work with those deeply motivated to make the future better.

入选理由：Jan Leike 感谢了多年在 AI 对齐领域合作的顶尖人才

FeaturedTweet#AI Alignment#OpenAI#Ethics英文

跨材料问答 · Jan Leike

回答基于：Jan Leike 相关 4 条材料