March 19,AliLarge Model ProductsListening to the general meaning"Released a number of new features, including the online audio and video question-and-answer assistant "Xiaowu", one-click AI rewriting, mind map generation and other six major functions.
Listen to the common meaningIruto Yoshikichi large model, integrating more than ten AI functions, including transcription, translation, role separation, full-text summary, chapter overview, speech summary, PPT extraction, etc., and supports marking key points and taking notes.
Tongyi Tingwu has added six new features in this upgrade, the most important of which is the audio and video question-and-answer assistant "Xiaowu", which allows key information to be "asked" directly. Xiaowu uses multi-language query processing, long-length text understanding, instruction evolution framework optimization, and retrieval enhancement generation algorithms to achieve single-record, cross-record, and multi-language free question-and-answer for ultra-long audio and video for the first time in the industry. The length and number of audio and video files that support content question-and-answer have exceeded the industry's upper limit.
Users can not only call Xiaowu on a single record page, ask any questions about audio and video up to 6 hours and 6G in size, or directly ask Xiaowu to sort out golden sentences, sort out conclusions, and write meeting minutes; they can also ask questions about all user records on the homepage, supporting one-time scanning and understanding of hundreds of audio and video content; they can also ask questions in Chinese about English videos, and Xiaowu will directly give Chinese answers, eliminating the need for translation. As an AI that "understands you", Xiaowu can also intelligently recommend questions.
In response to user needs, Tongyi Tingwu has also launched new capabilities such as one-click AI rewriting and mind map generation. For example, one-click AI rewriting converts spoken language into written expression, which is especially suitable for organizing interviews; mind maps are automatically generated, supporting up to five levels of xmind brain maps, which is suitable for podcast summaries.
▲ Example of a mind map for general understanding
The product details experience has also been further upgraded, including support for one-click insertion of video timestamps and screenshots in notes, and automatic recognition of the language of audio and video files.
In addition, Tongyi Tingwu launched the "University Charity Plan", all teachers and students of universities in mainland China passed the suffix edu.cn After the educational email address is authenticated, everyone can directly receive 500 hours of transcription time and the storage space will be expanded from 20G to 200G.
According to official introduction, as the first large-scale model product open to public beta in China, Tongyi Tingwu has accumulated millions of users since its release in June last year, including students, teachers, white-collar workers, reporters, lawyers, financial analysts and other groups. Active users transcribe audio and video more than 3 times a day on average, and the platform processes about 2 billion characters every day.