header-langage
简体中文
繁體中文
English
Tiếng Việt
한국어
日本語
ภาษาไทย
Türkçe
Scan to Download the APP

StepStar has released StepAudio 2.5 Realtime: Subjective Experience Score Crushes GPT-Realtime-1.5 by nearly 18%, priced at $3.8 per hour.

According to Watchful Beating monitoring, StepZenith has released the end-to-end real-time voice large model StepAudio 2.5 Realtime, focusing on "human-like" conversations, supporting full-dimensional persona customization, and sub-language (intonation, pauses, sighs, and other non-verbal cues) perception. The model has been fully deployed on the open platform API.

Based on the official five evaluation dimensions (April 2026 testing), it achieved first place in all. The most reflective subjective evaluation of real experience (mobile app live conversation rating) scored 80.41, GPT-Realtime-1.5 scored 68.01, and Gemini Live scored 67.16. Voice Q&A benchmark scored 79.80, nearly 1.5 times that of GPT-Realtime-1.5 (53.20). Sub-language understanding scored 82.18, general conversation 86.36, and in-car scenario 84.80.

The technical roadmap has three key designs. First, based on over 10,000 original personas, it generated a million-level persona feature matrix through algorithmic splitting, combined with extensive real dialogue corpus training, enabling the model to remain stable even on niche and long-tail topics. Second, it implemented a dedicated RLHF (Human Feedback-based Reinforcement Learning) alignment for role-playing scenarios to address the longstanding issue of AI "persona collapse" during conversations. Third, a deep integration of understanding and generation inherited the expressiveness from the in-house StepAudio 2.5 TTS, achieving overall scene setting and detailed refinement within sentences.

The API is compatible with the OpenAI Realtime API protocol (based on WebSocket), allowing developers for a low-cost migration. The pricing is set at 10 yuan/million tokens for input (2 yuan for cache hits) and 70 yuan/million tokens for output. The official estimated cost for continuous voice calls is approximately 3.8 yuan/hour.

举报 Correction/Report
Correction/Report
Submit
Add Library
Visible to myself only
Public
Save
Choose Library
Add Library
Cancel
Finish