StepAudio 2.5 Realtime Voice Launch: Paralinguistic Perception and Personalized Interaction

TL;DR · AI Summary
StepFun launches StepAudio 2.5 real-time voice model with paralinguistic perception and personalized interaction capabilities.
Key Takeaways
- StepAudio 2.5 supports real-time voice synthesis, capturing tone, rhythm, pauses
- API-based customization allows for 10,000+ native persona templates with million
- Model fine-tuned with ZH/EN RLHF to maintain character consistency under rolepla
Outline
Jump quickly between sections.
StepFun introduces StepAudio 2.5 real-time voice model with advanced paralinguistic perception.
Model captures tone, rhythm, pauses, and laughter to enhance conversational realism.
Customizable personalities via API enable diverse character expressions through backstories and styles.
Over 10,000 native personas support millions of possible combinations.
Five preset personas available for immediate use to lower entry barrier.
Model trained with ZH/EN RLHF to ensure character consistency under roleplay stress tests.
Mindmap
See how the topics connect at a glance.
查看大纲文本(无障碍 / 无 JS 友好)
- StepAudio 2.5 实时语音模型
- 副语言感知
- 语气识别
- 节奏分析
- 微表情捕捉
- 人格化交互
- API 自定义人格
- 角色设定模板
- 语言风格匹配
- 技术特性
- 中英双语微调
- RLHF 训练
- 实时响应
Highlights
Key sentences worth saving and sharing.
StepAudio 2.5 captures paralinguistic features like tone, rhythm, pauses, and even half-laughter in real time.
Customize personalities via API with background stories and language styles for unique interactions.
More than 10,000 native personas allow for over a million possible combinations.
Real-time voice that picks up what you actually mean — tone, pace, pauses, sighs, even the half-laugh mid-sentence.
⚡ Top-tier paralinguistic perception — reads tone, pace, micro-emotions ⚡ Bring-your-own persona via API — personality, https://t.co/MaDeqskMEx" / X
StepFun on X: "StepAudio 2.5 Realtime is live! Real-time voice that picks up what you actually mean — tone, pace, pauses, sighs, even the half-laugh mid-sentence. ⚡ Top-tier paralinguistic perception — reads tone, pace, micro-emotions ⚡ Bring-your-own persona via API — personality, https://t.co/MaDeqskMEx" / X
Don’t miss what’s happening

StepAudio 2.5 Realtime is live! Real-time voice that picks up what you actually mean — tone, pace, pauses, sighs, even the half-laugh mid-sentence. Top-tier paralinguistic perception — reads tone, pace, micro-emotions
Bring-your-own persona via API — personality, backstory, quirks, language style
10,000+ native personas → millions of feature combinations
5 preset personas to try out of the box
ZH/EN RLHF-tuned to hold character even under roleplay stress tests. Try it → https://stepfun.com/studio/audio?t ab=voice-chat… Model card: https://stepaudiollm.github.io/step-audio-2.5-realtime/…
·
1
4
2