StepAudio 2.5 实时语音发布：副语言感知与人格化交互

AI HOT 精选

AI HOT 精选2026年5月23日

StepAudio 2.5 Realtime Voice Launch: Paralinguistic Perception and Personalized Interaction

7.5Score

TL;DR · AI Summary

StepFun launches StepAudio 2.5 real-time voice model with paralinguistic perception and personalized interaction capabilities.

Key Takeaways

StepAudio 2.5 supports real-time voice synthesis, capturing tone, rhythm, pauses
API-based customization allows for 10,000+ native persona templates with million
Model fine-tuned with ZH/EN RLHF to maintain character consistency under rolepla

Outline

Jump quickly between sections.

§StepAudio 2.5 Launch
StepFun introduces StepAudio 2.5 real-time voice model with advanced paralinguistic perception.
·Paralinguistic Perception
Model captures tone, rhythm, pauses, and laughter to enhance conversational realism.
·Personalized Interaction
Customizable personalities via API enable diverse character expressions through backstories and styles.
›Persona Templates
Over 10,000 native personas support millions of possible combinations.
›Preset Personas
Five preset personas available for immediate use to lower entry barrier.
·Multilingual & Fine-tuning
Model trained with ZH/EN RLHF to ensure character consistency under roleplay stress tests.

Mindmap

See how the topics connect at a glance.

查看大纲文本（无障碍 / 无 JS 友好）

StepAudio 2.5 实时语音模型
- 副语言感知
  - 语气识别
  - 节奏分析
  - 微表情捕捉
- 人格化交互
  - API 自定义人格
  - 角色设定模板
  - 语言风格匹配
- 技术特性
  - 中英双语微调
  - RLHF 训练
  - 实时响应

Highlights

Key sentences worth saving and sharing.

StepAudio 2.5 captures paralinguistic features like tone, rhythm, pauses, and even half-laughter in real time.
— Paragraph 1
⬇︎ 下载 PNG 𝕏 分享到 X
Customize personalities via API with background stories and language styles for unique interactions.
— Paragraph 1
⬇︎ 下载 PNG 𝕏 分享到 X
More than 10,000 native personas allow for over a million possible combinations.
— Paragraph 1
⬇︎ 下载 PNG 𝕏 分享到 X

#Voice Synthesis#AI Voice#Paralinguistics#Personalized Interaction#StepFun

Open original article

Real-time voice that picks up what you actually mean — tone, pace, pauses, sighs, even the half-laugh mid-sentence.

⚡ Top-tier paralinguistic perception — reads tone, pace, micro-emotions ⚡ Bring-your-own persona via API — personality, https://t.co/MaDeqskMEx" / X

StepFun on X: "StepAudio 2.5 Realtime is live! Real-time voice that picks up what you actually mean — tone, pace, pauses, sighs, even the half-laugh mid-sentence. ⚡ Top-tier paralinguistic perception — reads tone, pace, micro-emotions ⚡ Bring-your-own persona via API — personality, https://t.co/MaDeqskMEx" / X

Don’t miss what’s happening

StepFun

@StepFun_ai

StepAudio 2.5 Realtime is live! Real-time voice that picks up what you actually mean — tone, pace, pauses, sighs, even the half-laugh mid-sentence. Image 2: ⚡ Top-tier paralinguistic perception — reads tone, pace, micro-emotions Image 3: ⚡ Bring-your-own persona via API — personality, backstory, quirks, language style Image 4: ⚡ 10,000+ native personas → millions of feature combinations Image 5: ⚡ 5 preset personas to try out of the box Image 6: ⚡ ZH/EN RLHF-tuned to hold character even under roleplay stress tests. Try it → https://stepfun.com/studio/audio?t ab=voice-chat… Model card: https://stepaudiollm.github.io/step-audio-2.5-realtime/…

9:45 PM · May 23, 2026

·

117 Views

1

4

2