StepFun Releases StepAudio 2.5 Realtime: An End-to-End Voice Model with Roleplay-Specific RLHF and Paralinguistic Comprehension
8/10StepFun, a Shanghai-based AI lab, released StepAudio 2.5 Realtime in May 2026, an end-to-end real-time speech large language model with roleplay-specific RLHF and paralinguistic comprehension. It supports both Chinese and English via a WebSocket API and ranked first across all five benchmark dimensions, demonstrating state-of-the-art performance.
