T
traeai
Sign in

人物

Jan Leike

别名:janleike

AI 安全研究员,曾任 DeepMind 研究员,现关注 LLM 可解释性与对齐。

已跟踪 4 条高相关材料

TraeAI 观察

相关材料

已收录 4 条与 Jan Leike 相关的内容,按评分排序。

I'm really excited about this as a new tool in our interpretability tool kit

NLAs is an unsupervised method that converts LLM internal states into human-readable text, significantly improving model transparency and safety auditing.

入选理由:NLAs 是一种无监督技术,能将 LLM 内部激活向量转为自然语言描述。

FeaturedTweet#LLM#Interpretability#AI Safety#Anthropic英文
When I started to work on the alignment problem more than 10 years ago, we had no idea how AGI was g...

Jan Leike on X: The Evolution of AI Alignment Research Over a Decade

Jan Leike(@janleike)292 字 (约 2 分钟)
75

Jan Leike reflects on the transformation of AI alignment research over the past decade—from a niche field with only ~12 researchers and unclear methods to one now driven by RLHF, scalable oversight, and automated techniques like constitutional AI in models such as Claude.

入选理由:10 年前 AI 对齐领域仅有约 12 人作为副业从事研究,且方法混乱。

FeaturedTweet#AI Alignment#AGI#RLHF#Machine Learning英文
Jan Leike(@janleike) 图标

Jan Leike on X: Grateful for talented people in AI alignment

Jan Leike(@janleike)120 字 (约 1 分钟)
45

Jan Leike thanks talented collaborators in AI alignment, calling it a privilege to work with those deeply motivated to make the future better.

入选理由:Jan Leike 感谢了多年在 AI 对齐领域合作的顶尖人才

FeaturedTweet#AI Alignment#OpenAI#Ethics英文

跨材料问答 · Jan Leike

回答基于:Jan Leike 相关 4 条材料
    0 / 500

    AI may generate inaccurate information. Please verify important content.