Russian tech giants Yandex Launched aOpen SourceLarge language model training tool——YaFSDP, claiming to increase speed by up to 26% compared to existing tools.
According to the introduction,YaFSDP It outperforms the traditional FSDP method in terms of training speed, especially for large models. In terms of pre-training LLM, YaFSDP is 20% faster and performs better under high memory pressure conditions.
For example, YaFSDP can achieve an efficiency improvement of 21% for Llama 2 with 70 billion parameters, and 26% for Llama 3 with the same level of parameters. IT Home attached official data list:
Model | gpu-count | seq-len | num-ckpt-layers | speedup |
---|---|---|---|---|
Llama 2 7B | 64 | 2048 | 0 | 9.92% |
Llama 2 7B | 64 | 4096 | 0 | 3.43% |
Llama 2 7B | 64 | 8192 | 0 | 2.68% |
Llama 2 7B | 128 | 2048 | 0 | 9.57% |
Llama 2 7B | 128 | 4096 | 0 | 2.42% |
Llama 2 7B | 128 | 8192 | 0 | 2.32% |
Llama 2 13B | 128 | 2048 | 0 | 12.10% |
Llama 2 13B | 128 | 4096 | 0 | 3.49% |
Llama 2 34B | 128 | 2048 | 0 | 20.70% |
Llama 2 34B | 256 | 2048 | 0 | 21.99% |
Llama 2 34B | 256 | 4096 | 5 | 8.35% |
Llama 2 70B | 256 | 2048 | 10 | 21.48% |
Llama 2 70B | 256 | 4096 | 50 | 7.17% |
Llama 3 8B | 64 | 2048 | 0 | 11.91% |
Llama 3 8B | 64 | 4096 | 0 | 7.86% |
Llama 3 70B | 256 | 2048 | 20 | 26.60% |
Yandex says that by optimizing GPU usage, YaFSDP can save developers and companies a lot of money — potentially hundreds of thousands of dollars per month.
Mikhail Khruschev, a senior developer at Yandex and a member of the YaFSDP team, also mentioned that “we are currently actively trying various model architectures and parameter sizes to expand the versatility of YaFSDP.”