# microsoft/VibeVoice

Canonical URL: https://www.traeai.com/articles/5c39fc4b-6caf-41b0-9f50-3033bfd9fed6
Original source: https://simonwillison.net/2026/Apr/27/vibevoice/#atom-everything
Source name: Simon Willison's Weblog
Content type: article
Language: 英文
Score: 8.7
Reading time: 3 分钟
Published: 2026-04-27T23:46:56+00:00
Tags: 语音识别, 开源, AI模型, Microsoft

## Summary

微软开源了VibeVoice语音转文本模型，支持说话人分离，可在Mac上通过简单命令运行。

## Key Takeaways

- VibeVoice是微软开源的语音转文本模型，MIT许可并内置说话人分离功能。
- 在M5 Max MacBook Pro上处理1小时音频需约8分45秒，峰值内存消耗达30GB。
- 工具输出结构化JSON，包含时间戳、说话人ID和文本，适合进一步分析。

## Citation Guidance

When citing this item, prefer the canonical traeai article URL for the AI-readable summary and include the original source URL when discussing the underlying source material.