Recently discovered an AI-assisted toolMemo AI can transcribe audio and video into text, and supportsAI Translation,Speech Synthesis, as well as AI summary and automatic generation of mind maps.
It meets various needs related to audio and video in one stop. It has comprehensive functions and a simple interface. It can be used locally and offline, making it a good helper for multimedia learning.
Support various video sources
It supports online audio and video and local audio and video, including mainstream domestic Bilibili and foreign XY Station (note that it can only transcribe human voices, and the voices of cats and dogs may not be able to be transcribed for the time being).
Access to multiple speech recognition models
Memo's speech recognition (or transcription) uses OpenAI's open source Whisper model, which integrates almost all sizes of models. To facilitate selection, the developers have humanized the models into three types: extreme speed/balanced/high quality. If the original video is an English video with clear pronunciation, the Tiny model rarely makes mistakes in actual tests (OpenAI Saigao).
After the voice recognition transcription is completed, the main interface will be automatically entered.
Three major capabilities: translation, summarization, and speech synthesis
translate
Memo currently supports 13 translation engines, which are introduced below:
Direct use:No configuration required, Microsoft Translate and Google Translate are supported by default;
Manually configure the API:For 5 AI models including OpenAI (GPT series) and Zhipu AI and 5 translation engines including Deepl and Baidu Translate, you need to apply for the API on the official website of the corresponding model or translation engine and then configure it in Memo;
Offline Model:It supports running large models locally through Ollama, which is completely offline and requires configuring the operating environment, which is a bit troublesome.
The latter two are suitable for friends with strong hands-on skills. In fact, the effect of using Microsoft Translator directly is also good. You can choose according to the actual situation.
Summarize
The summary function actually borrows the API of two of the large models configured in the previous translation. It currently supports OpenAI and China's Zhipu AI. After the translation is completed, you can use the summary function.
The mind mapping function is essentially a segmented summary, but the presentation format is more intuitive and the efficiency is greatly improved.
Speech Synthesis
The text-to-speech (TTS) function can generate dubbing for translated subtitles and supports three services:
Edge:Microsoft's open source TTS engine is a good choice if you don't have any special requirements.
OpenAI:Still using the API configured above, Memo will directly help you call OpenAI's TTS capabilities for speech synthesis;
Volcano:That is, the Volcano Engine of Tik Tok, which has the same timbre as the dubbing function in Jianying, but requires a separate API configuration.
Installation Experience
Memo client supports Windows and MacOS, download here:
Final Thoughts
The tool is still in the testing phase, so there are still some minor issues with the experience, such as inconsistent return buttons and inability to stop exporting after it starts, but overall it does not affect the main process.
The developers are very attentive to the product. For example, for the core function of translation, the developers have taken into account the inaccuracy that still exists in AI translation. In particular, the differences in sentence segmentation between different languages lead to differences in actual meanings. Auxiliary functions such as sentence merging, search and replacement, and subtitle editing are provided for subtitles. The personal experience is really good. You can experience more functions yourself.
There are not many paid features at present, mainly GPU acceleration and batch operations, so most of the basic functions are free to use.
The only drawback may be that Memo is not a web-based tool that can be used out of the box. Some of the services you access need to apply for APIs yourself, which may not be very friendly to novice users. Interested friends can join the group at the end of the article to help each other (so friends who know a little bit of technology are also needed to join).