According to Dongcha Beating monitoring, Tongyi Labs released the speech recognition model Fun-ASR1.5 on April 20, and has already launched the API on Alibaba Cloud BaiLian and opened online experience on Modao Community. The official announcement states that this version covers 30 languages, the Chinese seven major dialect systems, and over 20 regional accents with a single model, no longer separate models by dialect.
Internal evaluations provided by Tongyi show that the word error rate in typical dialect scenarios has decreased by 56.2% compared to the previous version, with 5 dialects achieving accuracy rates above 90% and 15 dialects above 80%. Special optimization has been made for the recognition of ancient poetry, with an internal character-level accuracy rate of 97% as provided by the official. These numbers are all from Tongyi's self-testing and are not third-party benchmarks.
The long-tail dialects, which are the most difficult to handle in Chinese speech recognition, are now included in the same set of capabilities that are ready for commercial use. For scenarios such as educational live broadcasting, local government hotlines, and interview transcription, accessing parties will no longer need to split multiple recognition pipelines based on regional accents, making deployment simpler.
