TencentAnnouncing the launch of Hunyuan-Large Large ModelOfficially, it'sThe industry has nowOpen SourceThe Transformer-based Maximum MoE Model ofThe program has 389 billion total parameters (389B) and 52 billion active parameters (52B).
Tencent has open-sourced Hunyuan-A52B-Pretrain, Hunyuan-A52B-Instruct, and Hunyuan-A52B-Instruct-FP8 at Hugging Face, and has released a technical report and a training and reasoning operation manual detailing the model capabilities and the operation of training and reasoning.
Among the modeling technology advantages are the following:
- High-quality synthesized data: By augmenting training with synthetic data, Hunyuan-Large is able to learn richer representations, handle long contextual inputs, and better generalize to unseen data.
- KV Cache Compression: Adoption of Grouped Query Attention (GQA) and Cross-Layer Attention (CLA) strategies significantly reduces the memory footprint and computational overhead of the KV cache, and improves inference throughput
- Expert-specific learning rate scaling: Setting different learning rates for different experts ensures that each sub-model learns effectively from the data and contributes to the overall performance
- long context processing capabilityThe pre-trained model supports up to 256K text sequences and the Instruct model supports 128K text sequences, which significantly improves the processing power of long context tasks.
- Extensive benchmarkingExtensive experiments in multiple languages and tasks have proven the effectiveness and safety of Hunyuan-Large in real-world applications.
The relevant links are as follows:
-
Huggingface:https://huggingface.co/tencent/Tencent-Hunyuan-Large
-
Tencent Cloud:https://cloud.tencent.com/product/hunyuan