人物

Jan Leike

traeai 已收录 4 篇与 Jan Leike 相关的内容。最新一篇是「I'm really excited about this as a new tool in our interpretability tool kit」，由 Jan Leike(@janleike) 发布。

别名：janleike

AI 安全研究员，曾任 DeepMind 研究员，现关注 LLM 可解释性与对齐。

已跟踪 4 条高相关材料

TraeAI 观察

I'm really excited about this as a new tool in our interpretability tool kit

Jan Leike(@janleike) · 8.5 分

NLAs（自然语言适配器）是一种无监督方法，可将大语言模型内部状态转化为人类可读文本，显著提升模型可解释性与安全性审计能力。

When I started to work on the alignment problem more than 10 years ago, we had no idea how AGI was g...

Jan Leike(@janleike) · 7.5 分

Jan Leike 回顾了超过十年来 AI 对齐（alignment）研究的演变：从最初仅十几人参与、方法模糊的边缘领域，发展到如今因 RLHF 和可扩展监督等技术进步而变得实用，并推动了像 Claude 这样的模型具备宪法机制和自动化对齐研究。

Some personal news: I am starting a new research project at Anthropic. Very excited about this! Man...

Jan Leike(@janleike) · 5.5 分

Jan Leike 将在 Anthropic 开始新的研究项目，强调 AGI 成功不仅依赖对齐（alignment），还需其他关键因素。该声明引发社区关注但未提供具体细节。

Jan Leike(@janleike)5月9日152 字 (约 1 分钟)

NLAs 是一种无监督方法，可将大语言模型内部状态转为人类可读文本，大幅提升模型透明度与安全审计能力。

入选理由：NLAs 是一种无监督技术，能将 LLM 内部激活向量转为自然语言描述。

精选推文#LLM#可解释性#AI 安全#Anthropic英文

Jan Leike(@janleike)5月9日292 字 (约 2 分钟)

入选理由：10 年前 AI 对齐领域仅有约 12 人作为副业从事研究，且方法混乱。

精选推文#AI 对齐#AGI#RLHF#机器学习英文

Jan Leike(@janleike)5月9日113 字 (约 1 分钟)

Jan Leike 将在 Anthropic 开始新的研究项目，强调 AGI 成功不仅依赖对齐（alignment），还需其他关键因素。

入选理由：Jan Leike 加入 Anthropic 开始新研究项目，聚焦 AGI 发展。

精选推文#AGI#Anthropic#研究英文

Jan Leike(@janleike)5月9日120 字 (约 1 分钟)

Jan Leike 感谢多年在 AI 对齐领域合作的顶尖人才，称与动机强烈的人共事是一种特权。

入选理由：Jan Leike 感谢了多年在 AI 对齐领域合作的顶尖人才

精选推文#AI 对齐#OpenAI#伦理英文

回答基于：Jan Leike 相关 4 条材料