Recently, the open source platform Hugging Face and NVIDIA announced an exciting new service, Inference-as-a-Service, which will be powered by NVIDIA's NIM technology. The launch of the new service will allow developers to prototype faster, use open source AI models available on the Hugging Face Hub, and deploy them efficiently.
The announcement was made at the ongoing SIGGRAPH2024 conference. This conference brings together a large number of experts in computer graphics and interactive technologies, and the unveiling of NVIDIA's partnership with Hugging Face comes at just the right time to open up new opportunities for developers. Through this service, developers can easily deploy powerful Large Language Models (LLMs), such as Llama2 and Mistral AI models, which are optimized by NVIDIA's NIM microservices.
Specifically, when accessed as a NIM, models like the 7 billion parameter Llama3 model are processed five times faster than when deployed on a standard NVIDIA H100Tensor Core GPU system, which is certainly a huge boost. Additionally, this new service supports Train on DGX Cloud, a service that is currently available on Hugging Face.
NVIDIA's NIM is a suite of AI microservices optimized for inference, encompassing both NVIDIA's AI foundation models and open source community models. It significantly improves Token processing efficiency through standard APIs and enhances the NVIDIA DGX Cloud infrastructure to accelerate the responsiveness and stability of AI applications.
The NVIDIA DGX Cloud platform is tailored specifically for generative AI, providing a reliable and accelerated compute infrastructure that helps developers move from prototype to production without a long-term commitment.The partnership between Hugging Face and NVIDIA will further solidify the developer community, and Hugging Face recently announced that its team has become profitable, with the team reaching a size of 2,000 and launching the SmolLM series of small language models. Hugging Face also recently announced that its team is profitable, has reached 220 people, and has launched the SmolLM family of small language models.