T
traeai
Sign in
返回首页
Simon Willison's Weblog

OpenAI WebRTC Audio Session, now with document context

8.5Score
OpenAI WebRTC Audio Session, now with document context

TL;DR · AI Summary

OpenAI 推出 GPT-Realtime-2 模型,支持在 WebRTC 会话中结合文档上下文进行语音交互。

Key Takeaways

  • OpenAI 推出 GPT-Realtime-2 模型,具备 GPT-5 级推理能力。
  • 开发者可通过 WebRTC API 在浏览器中实现带文档上下文的语音交互。
  • GPT-Realtime-2 模型知识截止日期为 2024 年 9 月 30 日。

Outline

Jump quickly between sections.

  1. 作者介绍了 OpenAI WebRTC Audio Session 工具的背景和最新更新。

  2. ·GPT-Realtime-2 模型的发布

    OpenAI 推出了 GPT-Realtime-2 模型,具备 GPT-5 级推理能力。

  3. 用户现在可以在 WebRTC 会话中粘贴文档上下文,以增强语音交互的准确性。

  4. 该工具可用于探索文档内容,并通过语音进行交互式讨论。

Mindmap

See how the topics connect at a glance.

查看大纲文本(无障碍 / 无 JS 友好)
  • OpenAI WebRTC Audio Session
    • GPT-Realtime-2 模型
      • GPT-5 级推理能力
      • 知识截止日期:2024 年 9 月 30 日
    • 文档上下文功能
      • 支持在 WebRTC 会话中粘贴文档上下文

Highlights

Key sentences worth saving and sharing.

#OpenAI#WebRTC#GPT-Realtime-2#语音交互
Open original article

12th June 2026 - Link Blog

[OpenAI WebRTC Audio Session, now with document context](https://tools.simonwillison.net/openai-webrtc). I built the first version of this tool in December 2024 to try out the then-new OpenAI WebRTC API for interacting with their realtime audio models.

Last month OpenAI introduced a brand new model to that API called GPT‑Realtime‑2, which they promoted as "our first voice model with GPT‑5‑class reasoning" - with a Sep 30, 2024 knowledge cut-off.

I've been waiting for that model to show up in the ChatGPT iPhone app but it still hasn't, so I revisited my old playground.

You can now pick the better model, and you can also paste in a big chunk of document context so you can have as audio conversation in your browser about whatever information you think would be useful to explore in a conversational way.

Image 1: Screenshot of a web interface titled "OpenAI WebRTC Audio Session" with a gray status dot. Form fields: "OpenAI API Token" showing a masked password of dots, "Voice" dropdown set to "Coral", "Model" dropdown set to "gpt-realtime-2". A collapsible section labeled "▼ Document context (optional — paste text to talk about)" with bold instruction "Paste a document here before starting the session and the model will be able to discuss it with you" above a textarea containing a pasted Markdown document about whether DuckDB can run untrusted SQL as safely as Datasette runs SQLite. Below are a blue "Start Session" button and a gray disabled "Mute Mic" button, then a green success message "Session established successfully!" At the bottom, a dark panel headed "Last transcript" reads: "DuckDB can be made about as safe as SQLite for running untrusted SELECT queries, but only if you lock it down properly. Using read only true by itself is not enough, because SQL can still" (text cut off).

AI may generate inaccurate information. Please verify important content.