模型

Whisper

Q: Whisper 最近有什么新动态？

traeai 已收录 6 篇与 Whisper 相关的内容。最新一篇是「Most in-car media systems still expect you to search with keywords. But when you’re driving, you do...」，由 Qdrant(@qdrant_engine) 发布。

别名：openai whisper

OpenAI 开发的语音识别模型，用于浏览器中的语音转录。

已跟踪 6 条高相关材料

TraeAI 观察

如果只读 3 篇

Most in-car media systems still expect you to search with keywords. But when you’re driving, you do...

Qdrant(@qdrant_engine) · 8.7 分

当前车载媒体系统仍依赖关键词搜索，而驾驶时用户更倾向于用情绪、氛围和意图表达需求；Sarvesh Talele 使用 Qdrant Edge 构建了完全本地化的 AI 驱动媒体发现系统，支持语音/文本/情绪三类语义查询，全程无需云端依赖，实现隐私优先的实时体验。

Multimodal Browser AI with Transformers.js for Images and Speech

Machine Learning Mastery · 8.5 分

Transformers.js 可在浏览器中实现图像分类、图像描述和语音转录的多模态 AI，无需服务器或 API 密钥。

Spec-driven development: The AI engineering workflow at Notion | Ryan Nystrom

Lenny's Newsletter · 8.5 分

Notion AI 的开发流程采用规范驱动开发，通过 Codex 自动生成规范并实现功能，提升工程效率。

Most in-car media systems still expect you to search with keywords. But when you’re driving, you don’t think in keywords — you think in moods, vibes, and intent.

Qdrant(@qdrant_engine)6月1日235 字 (约 1 分钟)

Current in-car media systems still rely on keyword-based search, but drivers naturally express needs through emotions, vibes, and intent—not terms. Sarvesh Talele’s project, built with Qdrant Edge, delivers a fully local, AI-powered media discovery system supporting voice, text, and mood-based semantic queries—no cloud needed, ensuring privacy-first, real-time experience.

入选理由：系统采用 Whisper 实现本地语音转录，Qdrant Edge 提供设备端向量检索，全程无云服务依赖

FeaturedTweet#Qdrant#Vector Search#Edge AI#In-Car System#Privacy英文

Multimodal Browser AI with Transformers.js for Images and Speech

Machine Learning Mastery6月14日8222 字 (约 33 分钟)

Transformers.js 可在浏览器中实现图像分类、图像描述和语音转录的多模态 AI，无需服务器或 API 密钥。

入选理由：Transformers.js 支持图像分类、图像描述和语音转录，且完全在浏览器中运行。

FeaturedArticle#Transformers.js#浏览器 AI#多模态#前端#机器学习英文

Spec-driven Development: The AI Engineering Workflow at Notion | Ryan Nystrom

Lenny's Newsletter5月11日487 字 (约 2 分钟)

Notion AI uses spec-driven development, generating specs and implementing features with Codex.

入选理由：使用 Whisper 和 Codex 实现规范驱动开发

FeaturedArticle#AI#Engineering Practices#Notion#Codex#Development Process中文

We released Gemma 4 12B yesterday. Here is a visual guide that explains the full architecture.

→ Ho...

Gemma 4 12B Released: Visual Guide to Native Multimodal Architecture

Philipp Schmid(@_philschmid)6月5日169 字 (约 1 分钟)

Gemma 4 12B achieves native multimodal processing for text, images, and audio by removing separate vision and audio encoders. This architecture replaces traditional encoder-patching approaches with joint representation learning, reducing inference latency and improving edge deployment efficiency.

入选理由：Gemma 4 12B移除独立视觉/音频编码器，采用原生多模态统一架构

FeaturedTweet#Gemma 4#Multimodal LLM#Native Multimodality#Edge AI英文

第三个模型 GPT-Realtime-Whisper 是个流式语音转文字模型

原版 Whisper 的设计前提是处理「完整的一段音频」，你录完一段交给它，它出转写结果。新的流式版本是边说边转，延迟极...

The Third Model GPT-Realtime-Whisper is a Streaming Speech-to-Text Model

小互(@imxiaohu)5月8日311 字 (约 2 分钟)

GPT-Realtime-Whisper is a streaming speech-to-text model designed for real-time scenarios, supporting low-latency processing unlike the original Whisper which handles complete audio batches.

入选理由：新版模型支持流式处理，无需等待整段音频完成即可输出结果。

FeaturedTweet#AI#Speech Recognition#Streaming Processing#Whisper#Real-time Communication中文

Adding Benchmaxxer Repellant to the Open ASR Leaderboard

Hugging Face Blog5月6日1283 字 (约 6 分钟)

Hugging Face introduces private datasets to prevent models from over-optimizing on public ASR test sets, while keeping the public Average WER unchanged to preserve real-world performance measurement.

入选理由：引入私有数据集防止模型针对公开测试集过度优化（benchmaxxing）。

FeaturedArticle#ASR#Benchmark#Hugging Face#Benchmaxxing#WER英文

跨材料问答 · Whisper

回答基于：Whisper 相关 6 条材料