T
traeai
Sign in
返回首页
AI HOT 精选

StepAudio 2.5 Realtime Voice Launch: Paralinguistic Perception and Personalized Interaction

7.5Score
StepAudio 2.5 Realtime Voice Launch: Paralinguistic Perception and Personalized Interaction

TL;DR · AI Summary

StepFun launches StepAudio 2.5 real-time voice model with paralinguistic perception and personalized interaction capabilities.

Key Takeaways

  • StepAudio 2.5 supports real-time voice synthesis, capturing tone, rhythm, pauses
  • API-based customization allows for 10,000+ native persona templates with million
  • Model fine-tuned with ZH/EN RLHF to maintain character consistency under rolepla

Outline

Jump quickly between sections.

  1. §StepAudio 2.5 Launch

    StepFun introduces StepAudio 2.5 real-time voice model with advanced paralinguistic perception.

  2. Model captures tone, rhythm, pauses, and laughter to enhance conversational realism.

  3. Customizable personalities via API enable diverse character expressions through backstories and styles.

  4. Over 10,000 native personas support millions of possible combinations.

  5. Five preset personas available for immediate use to lower entry barrier.

  6. Model trained with ZH/EN RLHF to ensure character consistency under roleplay stress tests.

Mindmap

See how the topics connect at a glance.

查看大纲文本(无障碍 / 无 JS 友好)
  • StepAudio 2.5 实时语音模型
    • 副语言感知
      • 语气识别
      • 节奏分析
      • 微表情捕捉
    • 人格化交互
      • API 自定义人格
      • 角色设定模板
      • 语言风格匹配
    • 技术特性
      • 中英双语微调
      • RLHF 训练
      • 实时响应

Highlights

Key sentences worth saving and sharing.

  • StepAudio 2.5 captures paralinguistic features like tone, rhythm, pauses, and even half-laughter in real time.

    Paragraph 1

    ⬇︎ 下载 PNG𝕏 分享到 X
  • Customize personalities via API with background stories and language styles for unique interactions.

    Paragraph 1

    ⬇︎ 下载 PNG𝕏 分享到 X
  • More than 10,000 native personas allow for over a million possible combinations.

    Paragraph 1

    ⬇︎ 下载 PNG𝕏 分享到 X
#Voice Synthesis#AI Voice#Paralinguistics#Personalized Interaction#StepFun
Open original article

Real-time voice that picks up what you actually mean — tone, pace, pauses, sighs, even the half-laugh mid-sentence.

⚡ Top-tier paralinguistic perception — reads tone, pace, micro-emotions ⚡ Bring-your-own persona via API — personality, https://t.co/MaDeqskMEx" / X

StepFun on X: "StepAudio 2.5 Realtime is live! Real-time voice that picks up what you actually mean — tone, pace, pauses, sighs, even the half-laugh mid-sentence. ⚡ Top-tier paralinguistic perception — reads tone, pace, micro-emotions ⚡ Bring-your-own persona via API — personality, https://t.co/MaDeqskMEx" / X

Don’t miss what’s happening

Image 1: Square profile picture

StepFun

@StepFun_ai

StepAudio 2.5 Realtime is live! Real-time voice that picks up what you actually mean — tone, pace, pauses, sighs, even the half-laugh mid-sentence. Image 2: ⚡ Top-tier paralinguistic perception — reads tone, pace, micro-emotions Image 3: ⚡ Bring-your-own persona via API — personality, backstory, quirks, language style Image 4: ⚡ 10,000+ native personas → millions of feature combinations Image 5: ⚡ 5 preset personas to try out of the box Image 6: ⚡ ZH/EN RLHF-tuned to hold character even under roleplay stress tests. Try it → https://stepfun.com/studio/audio?t ab=voice-chat… Model card: https://stepaudiollm.github.io/step-audio-2.5-realtime/…

Image 7: Image

9:45 PM · May 23, 2026

·

117 Views

1

4

2

AI may generate inaccurate information. Please verify important content.