Microsoft proposes a 1-bit large language model, they introduce a 1-bit large language model variant called BitNet b1.58. This model is comparable to a model with the same model size and training tokens in terms of perplexity and final task performance, while performing more cost-effectively in terms of latency, memory, throughput, and energy consumption. This 1.58-bit large language model defines a new scaling lemma and provides a way to train a new generation of high-performance and cost-effective large models. In addition, it introduces a new computational paradigm and provides new ideas for designing specific hardware for 1-bit large language models.
Paper address:
https://arxiv.org/abs/2402.17764