February 12 news.Wave InformationToday announced the launch of the Metabrain R1 Inference Server, through system innovation and optimization of hardware and software synergies.stand-aloneReady to deploy DeepSeek Model R1 671B.
Note: DeepSeek open-sources a multi-version model in which theDeepSeek R1 671B model as a fully parametric base macromodel, which provides stronger generalization, higher accuracy and better contextual understanding than the distillation model, but also places higher demands on the system's video memory capacity, video memory bandwidth, interconnect bandwidth and latency:
At least 800GB of memory is required at FP8 accuracy, and 1.4TB or more at FP16 / BF16 accuracy..
In addition, DeepSeek R1 is a typical long chain-of-mind model with short-input, long-output applications, and the inference and decoding phase relies on higher memory bandwidth and very low communication latency.
The Metabrain R1 Reasoning Server NF5688G7 comes with the FP8 compute engine natively.Provides 1128GB of HBM3e memory.In order to meet the requirement of no less than 800GB of video memory capacity under FP8 accuracy of 671B model, and to retain sufficient KV cache space while supporting full model inference on a stand-alone basis, the video memory bandwidth of this machine is up to 4.8 TB/s.
In terms of communication, GPU P2P bandwidth reaches 900GB/s, and based on the latest inference framework, it can support 20-30 users concurrently on a single machine. At the same time, a single NF5688G7 is equipped with 3200Gbps lossless expansion network, which can realize agile expansion according to the growth of user's business demand and provide R1 server cluster Turnkey solution.
The Metabrain R1 Reasoning Server NF5868G8 is a high-throughput reasoning server designed for Large Reasoning Model.Industry's first 16 standard PCIe double-width cards on a single machineIt provides up to 1536GB of memory and supports standalone deployment of DeepSeek 671B models at FP16/BF16 accuracy.
The machine adopts a 16-card fully interconnected topology based on PCIe fabric, and the P2P communication bandwidth of any two cards can reach 128GB/s, reducing the communication latency by more than 60%. Through the optimization of hardware and software collaboration, compared with the traditional 2-machine and 8-card PCIe model, the NF5868G8 can improve the inference performance of the DeepSeek 671B model by nearly 40%, and it currently supports a wide range of AI acceleration card options. The NF5868G8 supports multiple AI acceleration card options.