Can AI truly edit audio, not just generate it? 🎧

Tencent Hy, in collaboration with SJTU, SII, NTU,...

Hunyuan(@TXhunyuan)

Hunyuan(@TXhunyuan)2026年6月8日

Can AI truly edit audio, not just generate it? 🎧 Tencent Hy, in collaboration with SJTU, SII, NTU,...

8.5Score

TL;DR · AI 摘要

腾讯混元联合多所高校推出MMAE，首个全面评估AI音频编辑能力的基准，揭示当前模型在精确编辑音频任务上的不足。

核心要点

MMAE包含2000个高质量真实场景音频样本，覆盖语音、音乐及混合类型。
当前AI模型在音频编辑任务中的Exact Match Rate（EMR）低于5%，显示技术仍有较大提升空间。
MMAE支持从基础修改到多轮编辑的6种任务复杂度，涵盖局部和全局操作类型。

结构提纲

按章节快速跳转。

§引言
介绍腾讯混元联合多所高校推出MMAE，首个全面评估AI音频编辑能力的基准。
·MMAE的背景与目标
MMAE旨在评估AI在理解并精确修改现有音频的能力，而不仅仅是生成音频。
·MMAE的组成与特点
MMAE包含2000个高质量样本、17741个评估项，覆盖多种音频类型和任务复杂度。
·当前AI模型的表现
当前AI模型在音频编辑任务中的Exact Match Rate（EMR）低于5%，显示技术仍有较大提升空间。
·MMAE的应用与资源
MMAE提供arXiv、GitHub、HuggingFace和YouTube Demo等资源，便于研究和使用。

思维导图

用一张图看清主题之间的关系。

查看大纲文本（无障碍 / 无 JS 友好）

MMAE音频编辑基准
- 背景与目标
  - 评估AI音频编辑能力
  - 超越音频生成，强调精确修改
- 组成与特点
  - 2000个高质量音频样本
  - 17741个评估项
  - 覆盖语音、音乐、混合类型
- 当前AI表现
  - Exact Match Rate（EMR）低于5%
  - 技术仍有较大提升空间

金句 / Highlights

值得收藏与分享的关键句。

MMAE--A Massive Multitask Audio Editing Benchmark, is the first comprehensive evaluation benchmark for speech and
— 文章正文
⬇︎ 下载 PNG 𝕏 分享到 X
Current models show an Exact Match Rate (EMR) below 5%, revealing a major gap in reliable audio editing.
— 文章正文
⬇︎ 下载 PNG 𝕏 分享到 X
MMAE includes: ✅ 2,000 high-fidelity samples from real-world scenarios ✅ 17,741 fine-grained rubric evaluation items
— 文章正文
⬇︎ 下载 PNG 𝕏 分享到 X

#AI#音频编辑#腾讯#基准测试

打开原文

Tencent Hy on X: "Can AI truly edit audio, not just generate it? 🎧 Tencent Hy, in collaboration with SJTU, SII, NTU, TJU, ZODA, PKU, FDU, and other collaborators, introduces MMAE. MMAE--A Massive Multitask Audio Editing Benchmark, is the first comprehensive evaluation benchmark for speech and https://t.co/k5G4bicrOq" / X

Tencent Hy

@TencentHunyuan

Can AI truly edit audio, not just generate it? 🎧 Tencent Hy, in collaboration with SJTU, SII, NTU, TJU, ZODA, PKU, FDU, and other collaborators, introduces MMAE. MMAE--A Massive Multitask Audio Editing Benchmark, is the first comprehensive evaluation benchmark for speech and

d of simply requiring the AI to "generate" audio, it demands that the AI understand an existing audio clip and precisely modify it according to natural language instructions—altering what needs to be changed while leaving the rest untouched. Current models show an Exact Match Rate (EMR) below 5%, revealing a major gap in reliable audio editing. MMAE includes: ✅ 2,000 high-fidelity samples from real-world scenarios ✅ 17,741 fine-grained rubric evaluation items ✅ 7 modality settings across sound, music, speech and their mixtures ✅ 6 task complexity from basic modifications to multi-hop reasoning and multi-round editing ✅ 8 operation types across local and global granularities How to use: arXiv:

arxiv.org/abs/2606.07229

GitHub:

github.com/ddlBoJack/MMAE

HuggingFace:

huggingface.co/datasets/BoJac…

Demo:

youtu.be/6At5nTWhlXI

00:00

5:54 AM · Jun 8, 2026

17.1K

Views

9

3

39

2

4

7

247

1

127

Read 9 replies