Training large models on Fugaku supercomputer, Japanese joint research team releases Fugaku-LLM

Composed of multiple companies and institutionsJapanThe joint research team released the Fugaku-LLM yesterday. Large ModelThe biggest feature of this model is that it is trained on the Arm architecture supercomputer "Fugaku".

Development of the Fugaku-LLM model began in May 2023, with initial participants including Fujitsu, owner of the Fugaku supercomputer, Tokyo Institute of Technology, Tohoku University, and the RIKEN Institute of Physical and Chemical Research (RIKEN).

In August 2023, three other partners - Nagoya University, CyberAgent (also the parent company of game company Cygames) and HPC-AI startup Kotoba Technologies also joined the model development plan.

Training large models on Fugaku supercomputer, Japanese joint research team releases Fugaku-LLM

▲ Fugaku supercomputer. Image source: Fujitsu press release

In a press release released yesterday, the research team said it had fully exploited the performance of the Fugaku supercomputer, increasing the calculation speed of matrix multiplication by 6 times and the communication speed by 3 times.Prove that large pure CPU supercomputers can also be used for large model training.

The parameter size of the Fugaku-LLM model is 13B, is the largest large-scale language model in Japan.

It used 13,824 Fugaku supercomputing nodes to train on 380 billion tokens. Among its training materials, 60% were in Japanese, and the other 40% included English, mathematics, code, etc.

The model's research team claims that the Fugaku-LLM model can naturally use special expressions such as Japanese honorifics in communication.

Specifically in terms of test results, the model achieved an average score of 5.5 on the Japanese MT-Bench model benchmark test, ranking first among open models based on Japanese corpus resources, and received a high score of 9.18 in the humanities and social sciences category.

The Fugaku-LLM model is now publicly available on GitHub and Hugging Face platforms. External researchers and engineers can use the model for academic and commercial purposes as long as they comply with the license agreement.

statement:The content is collected from various media platforms such as public websites. If the included content infringes on your rights, please contact us by email and we will deal with it as soon as possible.

OpenAI announces live broadcast of ChatGPT upgrade content GPT-5 and AI search will not be unveiled for now

2024-5-11 9:14:27


What’s Worth Buying Releases Comprehensive AI Strategy: What’s Worth Buying Consumer Model Parameters Reach 13 Billion

2024-5-11 12:06:23
