Can AI truly edit audio, not just generate it? 🎧 Tencent Hy, in collaboration with SJTU, SII, NTU,...

TL;DR · AI 摘要
腾讯混元联合多所高校推出MMAE,首个全面评估AI音频编辑能力的基准,揭示当前模型在精确编辑音频任务上的不足。
核心要点
- MMAE包含2000个高质量真实场景音频样本,覆盖语音、音乐及混合类型。
- 当前AI模型在音频编辑任务中的Exact Match Rate(EMR)低于5%,显示技术仍有较大提升空间。
- MMAE支持从基础修改到多轮编辑的6种任务复杂度,涵盖局部和全局操作类型。
结构提纲
按章节快速跳转。
- §引言
MMAE旨在评估AI在理解并精确修改现有音频的能力,而不仅仅是生成音频。
MMAE包含2000个高质量样本、17741个评估项,覆盖多种音频类型和任务复杂度。
当前AI模型在音频编辑任务中的Exact Match Rate(EMR)低于5%,显示技术仍有较大提升空间。
MMAE提供arXiv、GitHub、HuggingFace和YouTube Demo等资源,便于研究和使用。
思维导图
用一张图看清主题之间的关系。
查看大纲文本(无障碍 / 无 JS 友好)
- MMAE音频编辑基准
- 背景与目标
- 评估AI音频编辑能力
- 超越音频生成,强调精确修改
- 组成与特点
- 2000个高质量音频样本
- 17741个评估项
- 覆盖语音、音乐、混合类型
- 当前AI表现
- Exact Match Rate(EMR)低于5%
- 技术仍有较大提升空间
金句 / Highlights
值得收藏与分享的关键句。
MMAE--A Massive Multitask Audio Editing Benchmark, is the first comprehensive evaluation benchmark for speech and
Current models show an Exact Match Rate (EMR) below 5%, revealing a major gap in reliable audio editing.
MMAE includes: ✅ 2,000 high-fidelity samples from real-world scenarios ✅ 17,741 fine-grained rubric evaluation items
Tencent Hy on X: "Can AI truly edit audio, not just generate it? 🎧 Tencent Hy, in collaboration with SJTU, SII, NTU, TJU, ZODA, PKU, FDU, and other collaborators, introduces MMAE. MMAE--A Massive Multitask Audio Editing Benchmark, is the first comprehensive evaluation benchmark for speech and https://t.co/k5G4bicrOq" / X
Tencent Hy
@TencentHunyuan
Can AI truly edit audio, not just generate it? 🎧 Tencent Hy, in collaboration with SJTU, SII, NTU, TJU, ZODA, PKU, FDU, and other collaborators, introduces MMAE. MMAE--A Massive Multitask Audio Editing Benchmark, is the first comprehensive evaluation benchmark for speech and
d of simply requiring the AI to "generate" audio, it demands that the AI understand an existing audio clip and precisely modify it according to natural language instructions—altering what needs to be changed while leaving the rest untouched. Current models show an Exact Match Rate (EMR) below 5%, revealing a major gap in reliable audio editing. MMAE includes: ✅ 2,000 high-fidelity samples from real-world scenarios ✅ 17,741 fine-grained rubric evaluation items ✅ 7 modality settings across sound, music, speech and their mixtures ✅ 6 task complexity from basic modifications to multi-hop reasoning and multi-round editing ✅ 8 operation types across local and global granularities How to use: arXiv:
arxiv.org/abs/2606.07229
GitHub:
github.com/ddlBoJack/MMAE
HuggingFace:
huggingface.co/datasets/BoJac…
Demo:
youtu.be/6At5nTWhlXI
00:00
5:54 AM · Jun 8, 2026
17.1K
Views
9
3
39
2
4
7
247
1
127
Read 9 replies