T
traeai
Sign in
返回首页
跨国串门儿计划Podcast1:04:29

#552. Why AI Progress Suddenly Feels Real: A Deep Dive into GPT 5.5, Reinforcement Learning, and the Last Mile of Models

9.2Score
#552. Why AI Progress Suddenly Feels Real: A Deep Dive into GPT 5.5, Reinforcement Learning, and the Last Mile of Models

Listen

Duration 1:04:29Original podcast page

问这期播客

会先在本集摘要、章节、转录和笔记里找答案。

TL;DR · AI Summary

GPT 5.5 and other models' capability improvements are not sudden jumps but result of model reliability crossing a key threshold. Reinforcement learning, post-training optimization, and evolving evaluation systems drive AI practicality.

Key Takeaways

  • GPT 5.5 enhances reasoning and tool usage for stronger practicality
  • Reinforcement learning shifts from competitions to real-world tasks, improving r
  • Posttraining is key to transforming 'knowledge-aware models' into 'human-useful

Outline

Jump quickly between sections.

  1. AI progress stems from model reliability crossing a critical threshold, not sudden capability leaps.

  2. GPT 5.5 shows significant enhancements in agentic coding, computer use, and knowledge work.

  3. Reinforcement learning moves from math competitions to real-world applications, enhancing performance.

  4. Posttraining transforms 'knowledge-aware models' into 'human-useful models'.

  5. Model as a Judge becomes essential due to increasing evaluation complexity.

Mindmap

See how the topics connect at a glance.

查看大纲文本(无障碍 / 无 JS 友好)
  • AI 进展真实化
    • 模型可靠性
      • 关键阈值
      • Agent 错误率控制
    • 训练流程
      • Posttraining
      • 强化学习
    • 评估体系
      • Model as a Judge
      • 评估难度上升

Highlights

Key sentences worth saving and sharing.

  • GPT 5.5’s improvement isn’t a leap—it’s model reliability finally crossing a key threshold.

    Paragraph 3

    ⬇︎ 下载 PNG𝕏 分享到 X
  • Reinforcement learning evolves from math contests to real-world tasks, making models more reliable and useful.

    Minute 35

    ⬇︎ 下载 PNG𝕏 分享到 X
  • Posttraining is the key step to turning 'knowledge-aware models' into 'human-useful models'.

    Minute 40

    ⬇︎ 下载 PNG𝕏 分享到 X

Chapters

  1. 开场 & 播客简介

    开场 & 播客简介

  2. MAD Podcast 开场:Yann Dubois 与 GPT 5.5 的背景

    MAD Podcast 开场:Yann Dubois 与 GPT 5.5 的背景

  3. 最近几个月发生了什么:可靠性跨过关键阈值

    最近几个月发生了什么:可靠性跨过关键阈值

  4. 什么叫模型可靠性:Agent 运行越久,错误概率越需要下降

    什么叫模型可靠性:Agent 运行越久,错误概率越需要下降

  5. GPT 5.5 发布背后:全公司协同与情绪起伏

    GPT 5.5 发布背后:全公司协同与情绪起伏

  6. GPT 5.5 的优势:agentic coding、computer use 与知识工作

    GPT 5.5 的优势:agentic coding、computer use 与知识工作

  7. 效率优化:从 token 数到 latency,再到用户真正感受到的性能

    效率优化:从 token 数到 latency,再到用户真正感受到的性能

  8. OpenAI PostTraining Frontiers 团队到底做什么

    OpenAI PostTraining Frontiers 团队到底做什么

  9. 从 word2vec 到低资源语言 NLP:Yann 如何进入 AI 领域

    从 word2vec 到低资源语言 NLP:Yann 如何进入 AI 领域

  10. 为什么拒绝量化基金:技术工作与正向影响

    为什么拒绝量化基金:技术工作与正向影响

  11. GPT5 发布演示:现场搭建法语学习 App 的紧张时刻

    GPT5 发布演示:现场搭建法语学习 App 的紧张时刻

  12. 2026 年的 reasoning 与 o1/o3 时代有什么不同

    2026 年的 reasoning 与 o1/o3 时代有什么不同

Transcript

开场 & 播客简介

MAD Podcast 开场Yann Dubois 与 GPT 5.5 的背景

最近几个月发生了什么可靠性跨过关键阈值

什么叫模型可靠性Agent 运行越久,错误概率越需要下降

GPT 5.5 发布背后全公司协同与情绪起伏

GPT 5.5 的优势agentic coding、computer use 与知识工作

效率优化从 token 数到 latency,再到用户真正感受到的性能

OpenAI PostTraining Frontiers 团队到底做什么

从 word2vec 到低资源语言 NLPYann 如何进入 AI 领域

为什么拒绝量化基金技术工作与正向影响

GPT5 发布演示现场搭建法语学习 App 的紧张时刻

2026 年的 reasoning 与 o1/o3 时代有什么不同

从可验证 reward 到真实用户价值

5.5 Thinking 与 5.5 Pro更多 test-time compute 是否值得

效率与思考时间把性能-延迟曲线向左移动

模型如何更会推理像专家一样少走弯路,也更早发现错误

Pretraining 是否撞墙为什么更大模型仍然有效

数据前沿synthetic data、multimodal data 与 embodied AI

World Models模拟有用,但不能过度优化不真实目标

Mid Training 是什么给高质量数据更高权重

Posttraining 的本质把“懂知识的模型”变成“对人有用的模型”

SFT 与 RL 的区别从模仿人类到优化 reward

RL 会创造新能力吗推理、检查答案与更长思考

为什么 RL 难扩展昂贵采样、长 rollout 与 attribution 难题

GRPO 与简单方法的胜利能随 compute 扩展的技术最有生命力

AI 系统是“建造”还是“种出来”从手艺到科学的研究过程

为什么大家先从 posttraining 改起迭代速度更快

垂直能力与横向能力模型为什么有时参差不齐

从数学、代码走向经济领域主动选择优先级与数据收集

泛化的边界竞赛聪明不等于真实世界聪明

幻觉问题为什么 SFT 可能反而奖励幻觉

Negative Transfer显式指令遵循与隐式意图理解的冲突

法律、医疗、金融也能追上 coding 吗关键在领域专家与 reward 可验证性

为什么 evals 越来越难任务开放、答案多样、专家稀缺

Model as a Judge为什么让模型评估模型会越来越重要

评估与训练的边界消失每个 eval 都可能变成训练数据生成器

未来 AI 进展会是连续还是断点式

Continual Learning为什么模型应该越用越懂你

为什么 continual learning 还没真正解决

Harness 会被模型吃掉吗通用框架与垂直场景的不同命运

应用层还有机会吗真正的护城河在最后一公里

结尾Matt 感谢 Yann,节目收尾

#AI#GPT#Reinforcement Learning#Model Training#OpenAI

Show notes

#552. Why AI Progress Suddenly Feels Real: A Deep Dive into GPT 5.5, Reinforcement Learning, and the Last Mile of Models

📝 Episode Summary

In this episode, we clone: The MAD Podcast with Matt Turck OpenAI's Yann Dubois: Why AI Progress Suddenly Feels Real

Our guest Yann Dubois is a co-lead of OpenAI’s PostTraining Frontiers team, involved in building cutting-edge models such as GPT 5.5, o3, and GPT5 Thinking. In this conversation, Yann explains from an internal researcher’s perspective why AI capabilities have suddenly felt “truly usable” over the past few months—not because of sudden leaps in capability, but because model reliability has finally crossed a critical threshold.

The show delves deep into the progress of GPT 5.5, the evolution of reasoning models, how reinforcement learning has moved from math and programming competitions to real-world tasks, and the roles played by pretraining, mid-training, and posttraining. Yann also discusses why evaluating models is becoming increasingly difficult, why "model as a judge" matters, why continual learning remains an unsolved problem, and why startups still have huge room for innovation in the “last mile.”

This is an ideal episode for AI practitioners, entrepreneurs, investors, and tech product managers: it not only explains how large model capabilities are trained, but also answers a more practical question—what opportunities remain at the application layer and vertical domains as models grow stronger.

👤 Guest

Yann Dubois, co-lead of OpenAI’s PostTraining Frontiers team. He was involved in creating advanced models like GPT 5.5, o3, and GPT5 Thinking. Before joining OpenAI, he worked on the Stanford Alpaca project at Stanford, which significantly influenced modern posttraining and open-source instruction tuning research. His research spans natural language processing, low-resource languages, multimodal representation learning, reinforcement learning, and frontier large model training.

⏱️ Timestamps

00:00 Intro & Episode Overview

Why AI Progress Suddenly Feels Stronger

02:15 MAD Podcast Intro: Yann Dubois and Background on GPT 5.5

03:25 What Happened Recently: Reliability Crossing a Key Threshold

05:56 What Is Model Reliability: Error Probability Should Decrease Over Time

07:10 Behind GPT 5.5 Launch: Company-wide Collaboration and Emotional Rollercoaster

08:45 Strengths of GPT 5.5: Agentic Coding, Computer Use, and Knowledge Work

10:47 Efficiency Optimization: From Token Count to Latency to User-perceived Performance

PostTraining Frontiers and Yann’s Research Path

11:52 What Does OpenAI’s PostTraining Frontiers Team Do?

13:13 From Word2Vec to Low-resource Language NLP: How Yann Entered the AI Field

14:41 Why He Turned Down Quantitative Funds: Technical Work and Positive Impact

15:21 GPT5 Demo: The Tense Moment of Building a French Learning App Live

Reasoning from Competition Problems to Real-world Applications

15:49 Reasoning in 2026 vs. the o1/o3 Era

17:12 From Verifiable Reward to Real User Value

18:07 5.5 Thinking vs. 5.5 Pro: Is More Test-time Compute Worth It?

19:37 Efficiency and Thinking Time: Moving the Performance-Latency Curve Left

20:45 How Models Think Better: Less Wandering Like Experts, Earlier Error Detection

Training Pipeline: Pretraining, Mid Training, and Posttraining

21:49 Has Pretraining Hit a Wall: Why Bigger Models Still Work

24:43 Data Frontiers: Synthetic Data, Multimodal Data, and Embodied AI

26:45 World Models: Simulation Helps, But Don’t Over-optimize Unrealistic Goals

28:02 What Is Mid Training: Giving Higher Weight to High-Quality Data

29:28 Essence of Posttraining: Turning “Knowledge-Aware Models” into “User-Friendly Models”

How Reinforcement Learning Shapes Cutting-edge Models

30:39 Difference Between SFT and RL: From Imitating Humans to Optimizing Rewards

33:28 Will RL Create New Capabilities: Reasoning, Answer Checking, Longer Thinking

35:00 Why RL Is Hard to Scale: Expensive Sampling, Long Rollouts, and Attribution Issues

37:32 GRPO and the Victory of Simple Methods: Technologies That Scale with Compute Are Most Resilient

38:13 Are AI Systems Built or Grown?: From Craft to Science in Research

40:26 Why We Start with Posttraining First: Faster Iteration Speed

41:57 Vertical vs. Horizontal Capabilities: Why Models Sometimes Vary in Quality

43:21 From Math and Code to Economics: Prioritization and Data Collection

44:43 Boundaries of Generalization: Being Smart in Competitions Doesn’t Equal Being Smart in Real Life

47:31 Hallucination Problem: Why SFT Might Actually Reward Hallucinations

49:00 Negative Transfer: Conflict Between Explicit Instruction Following and Implicit Intent Understanding

50:36 Can Legal, Medical, and Financial Domains Catch Up with Coding?: The Key Lies in Domain Experts and Verifiable Rewards

Evaluation, Model as a Judge, and the Capability Flywheel

52:23 Why Evals Are Getting Harder: Open-ended Tasks, Diverse Answers, and Expert Shortage

54:35 Model as a Judge: Why Letting Models Evaluate Models Matters More and More

55:20 Blur Between Evaluation and Training: Every Eval Could Become a Training Data Generator

Future 12–24 Months: Continuous Progress and Local Breakpoints

56:07 Will Future AI Progress Be Continuous or Discontinuous?

57:26 Continual Learning: Why Models Should Get Smarter with Usage

59:16 Why Continual Learning Isn’t Fully Solved Yet

59:59 Will Harness Be Eaten by Models?: Different Fates for Generic Frameworks and Vertical Scenarios

01:01:58 Are There Still Opportunities at the Application Layer?: The Real Moat Lies in the Last Mile

01:03:36 Closing: Matt Thanks Yann, Show Ends

🌟 Highlights

💡 AI Progress Isn’t Sudden, It’s About Reliability Crossing a Threshold

Yann believes that model capabilities are mostly continuously improving, but user perception isn't linear. When a model makes errors less than once every few minutes, AI tools shift from “interesting but unreliable” to “actually capable of doing work.” This is why recent experiences in coding and agentic work feel like a sudden leap.

“You need to reach a certain level of reliability before these AI tools can really be useful.”

🧠 The Key Shift in Reasoning: From Competition Problems to Real-world Use

Early reasoning models focused on optimizing math and programming competitions because these tasks had clear answers and verifiable rewards. Now, OpenAI is applying these reinforcement learning techniques to more chaotic and open-ended real-world tasks such as software engineering, knowledge work, enterprise processes, and complex data handling.

“So we’ve moved from competition scenarios to truly useful user scenarios, which is what we’re feeling now.”

⚙️ GPT 5.5 Efficiency: Not Just Smarter, But Faster

Yann particularly emphasizes the efficiency gains in GPT 5.5. Efficiency is not merely about reducing tokens or lowering latency, but optimizing within the coordinate system that users truly care about: achieving higher-quality answers with less waiting time. AI research is responsible for enabling models to achieve the same performance with fewer tokens, while engineering and inference teams focus on serving those tokens faster.

"The key point is that the X-axis is latency and the Y-axis is performance."

📚 The Essence of Posttraining: Transforming a Model from “Library” to “Expert”

Yann uses a clear analogy to explain posttraining: pretraining is like having the model read an entire library, acquiring vast knowledge of the world; but what users really need isn’t a library—it’s an expert who has read these books, understands problems, and can offer help. The goal of posttraining is to transform knowledge into interactive, executable, and useful capabilities.

"At its core, it's about turning something that knows various things in the world into something useful to people."

🧪 Why Reinforcement Learning Is Hard: You Often Only Know the Outcome at the End

In agent tasks, models may go through long sequences of operations before discovering whether the result was correct. This leads to attribution challenges: which step caused success or failure? This is also one of the main reasons why RL struggles to scale in complex real-world tasks. However, Yann believes that when foundational models already understand the world well, the effectiveness of RL improves significantly.

"You only know which part is good or bad at the end."

👻 Hallucinations May Come from SFT, But RL Has the Potential to Reduce Them

Yann mentions John Schulman’s perspective: if a model doesn’t originally know something, but the SFT’s ground truth answer requires it to state that thing, the training process might force the model to learn to "fabricate." In RL, if the model doesn’t know something, it’s almost impossible to randomly sample the correct answer, so a proper RL process is more likely to suppress the behavior of answering incorrectly when unsure.

"SFT forces models to hallucinate."

📏 Evaluations Are a Critical Bottleneck in Model Progress

As model tasks become increasingly open-ended, evaluation becomes harder. Previously, you only needed to check if there were bugs in code; now, you might have to judge how well a full website is done, and there are many ways to define “good.” Yann believes that identifying issues, building evaluations, and quantifying improvements are at least as important as training models, if not more so.

"Identifying problems and ensuring we can quantify improvements are at least equally important—and possibly even more so."

🔁 Continual Learning Remains a Major Unsolved Problem

Yann is very excited about continual learning. He believes that today’s models might be more useful than new hires on their first day at a company, but they don’t accumulate internal knowledge, understand work habits, or grow stronger over time like humans do. The ideal AI should become more useful to users the longer it works in an environment.

"The longer a model works in a certain environment, the more useful it becomes."

🚀 Startup Opportunities Still Lie in the Last Mile

For application-level and startup companies, Yann gives a very clear assessment: raw model intelligence is not necessarily the final moat. The real moats often lie in the last mile—permissions, data connectivity, workflows, domain knowledge, and user scenario understanding. OpenAI will focus more on general capabilities, while vertical domains still offer plenty of room.

"I think most of the time, the real moat lies in the last mile."

🌐 Podcast Information Supplement

This podcast was produced using the original voice tone, and some parts might sound a bit odd.

AI translation was used, so there may be some awkward phrasings;

If you’d like to listen to other foreign-language podcasts in Chinese, feel free to contact WeChat: iEvenight

AI may generate inaccurate information. Please verify important content.