T
traeai
Sign in

模型

Whisper

别名:openai whisper

OpenAI 开发的语音识别模型,用于浏览器中的语音转录。

已跟踪 6 条高相关材料

TraeAI 观察

相关材料

已收录 6 条与 Whisper 相关的内容,按评分排序。

Most in-car media systems still expect you to search with keywords.

But when you’re driving, you do...

Current in-car media systems still rely on keyword-based search, but drivers naturally express needs through emotions, vibes, and intent—not terms. Sarvesh Talele’s project, built with Qdrant Edge, delivers a fully local, AI-powered media discovery system supporting voice, text, and mood-based semantic queries—no cloud needed, ensuring privacy-first, real-time experience.

入选理由:系统采用 Whisper 实现本地语音转录,Qdrant Edge 提供设备端向量检索,全程无云服务依赖

FeaturedTweet#Qdrant#Vector Search#Edge AI#In-Car System#Privacy英文
Machine Learning Mastery 图标

Multimodal Browser AI with Transformers.js for Images and Speech

Machine Learning Mastery8222 字 (约 33 分钟)
85

Transformers.js 可在浏览器中实现图像分类、图像描述和语音转录的多模态 AI,无需服务器或 API 密钥。

入选理由:Transformers.js 支持图像分类、图像描述和语音转录,且完全在浏览器中运行。

FeaturedArticle#Transformers.js#浏览器 AI#多模态#前端#机器学习英文
Spec-driven development: The AI engineering workflow at Notion | Ryan Nystrom

Spec-driven Development: The AI Engineering Workflow at Notion | Ryan Nystrom

Lenny's Newsletter487 字 (约 2 分钟)
85

Notion AI uses spec-driven development, generating specs and implementing features with Codex.

入选理由:使用 Whisper 和 Codex 实现规范驱动开发

FeaturedArticle#AI#Engineering Practices#Notion#Codex#Development Process中文
We released Gemma 4 12B yesterday. Here is a visual guide that explains the full architecture.

→ Ho...

Gemma 4 12B Released: Visual Guide to Native Multimodal Architecture

Philipp Schmid(@_philschmid)169 字 (约 1 分钟)
75

Gemma 4 12B achieves native multimodal processing for text, images, and audio by removing separate vision and audio encoders. This architecture replaces traditional encoder-patching approaches with joint representation learning, reducing inference latency and improving edge deployment efficiency.

入选理由:Gemma 4 12B移除独立视觉/音频编码器,采用原生多模态统一架构

FeaturedTweet#Gemma 4#Multimodal LLM#Native Multimodality#Edge AI英文
第三个模型 GPT-Realtime-Whisper 是个流式语音转文字模型

原版 Whisper 的设计前提是处理「完整的一段音频」,你录完一段交给它,它出转写结果。新的流式版本是边说边转,延迟极...

The Third Model GPT-Realtime-Whisper is a Streaming Speech-to-Text Model

小互(@imxiaohu)311 字 (约 2 分钟)
55

GPT-Realtime-Whisper is a streaming speech-to-text model designed for real-time scenarios, supporting low-latency processing unlike the original Whisper which handles complete audio batches.

入选理由:新版模型支持流式处理,无需等待整段音频完成即可输出结果。

FeaturedTweet#AI#Speech Recognition#Streaming Processing#Whisper#Real-time Communication中文
Adding Benchmaxxer Repellant to the Open ASR Leaderboard

Adding Benchmaxxer Repellant to the Open ASR Leaderboard

Hugging Face Blog1283 字 (约 6 分钟)
52

Hugging Face introduces private datasets to prevent models from over-optimizing on public ASR test sets, while keeping the public Average WER unchanged to preserve real-world performance measurement.

入选理由:引入私有数据集防止模型针对公开测试集过度优化(benchmaxxing)。

FeaturedArticle#ASR#Benchmark#Hugging Face#Benchmaxxing#WER英文

跨材料问答 · Whisper

回答基于:Whisper 相关 6 条材料
    0 / 500

    AI may generate inaccurate information. Please verify important content.