header-langage
简体中文
繁體中文
English
Tiếng Việt
한국어
日本語
ภาษาไทย
Türkçe
Scan to Download the APP

Cartesia has released the all-new TTS and STT models, Sonic-3.5 and Ink-2

According to Dynamic Beating monitoring, AI voice model startup Cartesia announced the release of Sonic-3.5 and Ink-2, introducing a unified real-time speech AI technology stack composed of the two models. Sonic-3.5 is responsible for Text-to-Speech (TTS), while Ink-2 handles Speech-to-Text (STT).

Sonic-3.5 focuses on real-time low-latency speech generation, reducing the initial audio output time to 90 milliseconds. It natively supports 42 languages and can pronounce English homographs and alphanumeric characters without preprocessing.

Ink-2 has reduced its Word Error Rate to 3.6% and introduced native turn detection and noise handling mechanisms. It can determine if a user has finished speaking based on sentence context and semantic understanding, instead of solely relying on traditional silence duration. Currently, Ink-2 is available only in English, with multi-language support planned for future releases.

Developers can invoke both models simultaneously through a single API. Sonic-3.5 and Ink-2 are designed to interact bidirectionally to minimize transmission latency and system overhead caused by "multi-vendor stitching."

举报 Correction/Report
Correction/Report
Submit
Add Library
Visible to myself only
Public
Save
Choose Library
Add Library
Cancel
Finish