OpenAI WebRTC Audio Session, now with document context

TL;DR · AI Summary
OpenAI 推出 GPT-Realtime-2 模型,支持在 WebRTC 会话中结合文档上下文进行语音交互。
Key Takeaways
- OpenAI 推出 GPT-Realtime-2 模型,具备 GPT-5 级推理能力。
- 开发者可通过 WebRTC API 在浏览器中实现带文档上下文的语音交互。
- GPT-Realtime-2 模型知识截止日期为 2024 年 9 月 30 日。
Outline
Jump quickly between sections.
Mindmap
See how the topics connect at a glance.
查看大纲文本(无障碍 / 无 JS 友好)
- OpenAI WebRTC Audio Session
- GPT-Realtime-2 模型
- GPT-5 级推理能力
- 知识截止日期:2024 年 9 月 30 日
- 文档上下文功能
- 支持在 WebRTC 会话中粘贴文档上下文
Highlights
Key sentences worth saving and sharing.
GPT-Realtime-2 是 OpenAI 推出的首个具备 GPT-5 级推理能力的语音模型。
用户现在可以在 WebRTC 会话中粘贴文档上下文,以增强语音交互的准确性。
GPT-Realtime-2 模型的知识截止日期为 2024 年 9 月 30 日。
12th June 2026 - Link Blog
[OpenAI WebRTC Audio Session, now with document context](https://tools.simonwillison.net/openai-webrtc). I built the first version of this tool in December 2024 to try out the then-new OpenAI WebRTC API for interacting with their realtime audio models.
Last month OpenAI introduced a brand new model to that API called GPT‑Realtime‑2, which they promoted as "our first voice model with GPT‑5‑class reasoning" - with a Sep 30, 2024 knowledge cut-off.
I've been waiting for that model to show up in the ChatGPT iPhone app but it still hasn't, so I revisited my old playground.
You can now pick the better model, and you can also paste in a big chunk of document context so you can have as audio conversation in your browser about whatever information you think would be useful to explore in a conversational way.
