DeepSeek Releases Open Source Project FlashMLA

February 24DeepSeek Open Source Week First Project FlashMLA Officially released.

Officially, FlashMLA is inspired by FlashAttention 2&3 and the cutlass project. Specifically, FlashMLA is an efficient MLA (Multi-Head Latent Attention) decoding kernel optimized for Hopper GPUs with support for variable-length sequence processing, and is now in production use.

Optimized for multi-layer attention mechanisms, FlashMLA accelerates the decoding process of LLM to improve model responsiveness and throughput, which is especially important for real-time generative tasks (e.g., chatbots, text generation, etc.). In short, FlashMLA is an optimization that makes LLM models faster and more efficient on H800, especially for high-performance AI tasks.

Currently, the released version of FlashMLA supports the features of "BF16" and "Paged KV Cache, Block Size 64", which enables 3,000 GB/s of memory bandwidth and 580 TFLOPS of compute performance on the H800.

FlashMLA is now available on GitHub, and within 6 hours of its launch, it had more than 5,000 Star Favors and 188 Forks.

In addition, an investor focusing on AI hardware research said through Sina Technology that the FlashMLA released by DeepSeek is a major boon for domestic GPUs (graphics cards).

The investors analyzed that the previous domestic GPU performance is weak, now we can use the optimization ideas and methodology provided by FlashMLA to try to make the domestic GPU to greatly improve the performance, even if the architecture is different, the reasoning performance of the domestic graphics card will be a natural thing to improve later.

statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.
Information

Net rumor "30 primary and secondary schools in Chengdu, Sichuan Province will start robotics instruction", the official response said "news is not true".

2025-2-25 11:26:58

Information

Perplexity Announces "Comet" Browser, Focuses on "AI Smart Body Search"

2025-2-25 20:32:40

Search