China Telecom AI Research Institute Completes the First Fully Localized Wankawansen Large Model Training, and TeleChat2-115B Is Open-Sourced to the Public

China Telecom AI Research Institute Completes the First Fully Localized Wankawansen Large Model Training, and TeleChat2-115B Is Open-Sourced to the Public

Sept. 28, "China TelecomArtificial Intelligence Research Institute"The official public number announced that China Telecom Artificial Intelligence Research Institute (hereinafter referred to as TeleAI) successfully completed the Nationalthe first trillion-parameter large model trained on a fully localized Wanka cluster.and officially open to the publicOpen SourceThe First 100 Billion Parameter Large Model Trained on Fully Domesticated Wanka Cluster and Domestic Deep Learning Framework -- StarTatsu semantic macromodel TeleChat 2-115B.

Officials say the scientific research marks theDomestic large model training to truly realize the full localization of alternativeThe company has formally entered a new stage of independent innovation, safety and control of the national production.

TeleChat2-115B is based on China Telecom's self-developed Tianyi Cloud "Xiyang Integrated Intelligent Computing Service Platform" and the artificial intelligence company's "Star Ocean AI Platform". According to reports, under the premise of guaranteeing the training accuracy, it uses a variety of optimization means to improve the model training efficiency and stability, and achieves the GPU equivalent arithmetic computation efficiency of more than 93%, and the effective training time of the model accounts for more than 98%.

For ultra-large parameter model training, TeleAI employs aLots of miniaturesScaling is performed to verify the effectiveness of different model structures. Meanwhile, in terms of data allocation, based on the feedback of the experimental results of the small model, the regression prediction model is used to get the better data allocation.

For Post-Training, TeleAI first synthesized a large amount of Q&A data for math, code, and logical reasoning for the first phase of SFT (supervised fine-tuning) model training.

Second, it adopts an iterative updating strategy that uses models to enhance the complexity of instructions and expand the diversity of cue word data, improves the quality of answers through model synthesis and manual annotation, and utilizes rejection sampling to obtain high-quality SFT data and representative data of the RM (Reward Model), which are used for SFT training and DPO (Preference Alignment) training as well as iterative modeling effects.

with open source address

GitHub:

  • https://github.com/Tele-AI/TeleChat2

Gitee:

  • https://gitee.com/Tele-AI/tele-chat2

ModelScope:

  • https://modelscope.cn/models/TeleAI/TeleChat2-115B

Modelers:

  • https://modelers.cn/models/TeleAI/TeleChat2-115B
statement:The content is collected from various media platforms such as public websites. If the included content infringes on your rights, please contact us by email and we will deal with it as soon as possible.
Information

The smarter the AI, the more likely it is to "make stuff up," study finds.

2024-9-30 10:23:32

Information

Perplexity Native for macOS Goes Live This Month 15: AI Disrupts Search Experience, Reinvents Knowledge Management for Apple Mac Users

2024-10-1 12:14:43

Search