NewsFlash Articles Data Fundraising Skill&API

Red Book (Xiaohongshu) has open-sourced the end-to-end neural text-to-speech (TTS) model dots.tts, supporting zero-shot voice cloning

According to Perceiving Excellence monitoring, RedX hi lab has open-sourced a 2 billion-parameter end-to-end autoregressive Text-to-Speech (TTS) model called dots.tts, and has released the full inference and fine-tuning code under the Apache 2.0 license. The publicly released weights include the base pretrained version, self-correcting alignment (SCA) fine-tuned version, and low-latency inference distillation version.

Unlike traditional TTS architectures relying on Discrete Codec Tokens for audio encoding and decoding (such as VALL-E, CosyVoice, ChatTTS, etc.), dots.tts has achieved a fully continuous, end-to-end autoregressive flow-based architecture, completely avoiding the use of any discrete tokens throughout the entire pipeline. dots.tts combines continuous features extracted from AudioVAE at a 48 kHz sampling rate with a semantic encoder, a base language model (initialized from Qwen2.5-1.5B-Base, directly processing BPE text without Pinyin input), and an autoregressive flow-based acoustics head to predict continuous latent variables, which are then reconstructed into audio by the generator. By directly predicting continuous features, dots.tts bypasses any audio quality loss caused by discrete quantization, preserving pronunciation details, timbre similarity, and expressive emotion.

dots.tts is pretrained on about 1.5 million hours of speech data. In the Seed-TTS-Eval evaluation, dots.tts achieved a word error rate (WER) of 0.94% / 1.30% / 6.60% on Chinese, English, and Chinese difficult test sets, respectively, with similarity scores (SIM) of 81.0 / 77.1 / 79.5, all reaching the state-of-the-art open-source level. In the MiniMax Multilingual benchmark test for 24 languages, the average speaker similarity reached 83.9. RedX has provided a Gradio experience space on Hugging Face for users to test zero-shot voice cloning online.

Source

Correction/Report

On-Chain Activity

2h ago

Joseph Lubin mortgaged 412,430 ETH to borrow 259 million DAI, with a Loan-to-Value (LTV) ratio dropping below 1.2 temporarily.

2h ago

Weekly Macro Outlook: SpaceX IPO Imminent, Potential Acceleration of US CPI to Trigger Market Turbulence

South Korean Retail Investors 'Top Out' US Stocks, Selling Over 1 Trillion Korean Won of Overseas Stocks This Week

Open Source vs. Closed Source “Money Grab”: Price Discrepancy Reaches 40X, Open Source Model Gains Traction as Industry Giants Rake in Billions in Revenue

ChatGPT is undergoing a major upgrade to transform into a "Super App," integrating programming tools with an AI assistant

Correction/Report

Submit

Add Library

Visible to myself only

Public

Save

Choose Library

Add Library

Cancel

Finish

Red Book (Xiaohongshu) has open-sourced the end-to-end neural text-to-speech (TTS) model dots.tts, supporting zero-shot voice cloning

Joseph Lubin mortgaged 412,430 ETH to borrow 259 million DAI, with a Loan-to-Value (LTV) ratio dropping below 1.2 temporarily.

Hex Trust's affiliate entity accumulates 182 million H tokens, with a total value of $121 million

A certain Ethereum OG Whale is back, buying 35,723 ETH at an average price of $1,563 each.

A certain whale has accumulated approximately 1,723 Bitcoins in the last 24 hours, worth around $105.58 million.