Station BYesterdayOpen SourceThe lightweight Index-1.9B seriesModel, including multiple versions such as base model, control group model, dialogue model, role-playing model, etc.
Official Introduction:
-
Index - 1.9B base: The base model has 1.9 billion non-word embedding parameters and is pre-trained on 2.8T of Chinese and English-based corpora. It is ahead of models of the same level on multiple evaluation benchmarks.
-
Index-1.9B pure : The control group of the base model has the same parameters and training strategy as the base, but the difference is that all instruction-related data in the corpus of this version is strictly filtered to verify the impact of instructions on the benchmark.
-
Index-1.9B chat:Based on the index-1.9B base, the dialogue model is aligned through SFT and DPO. As more Internet community corpus is introduced in pre-training, the chat is obviously more interesting.
-
Index-1.9B character:RAG was introduced based on SFT and DPO to achieve fewshots role-playing customization.
According to reports, the model used 2.8T of data in the pre-training stage, with a Chinese-English ratio of 4:5 and 6% of code. Currently, the role-playing model has a built-in character "San San", and users can also create their own characters on demand.
Project address:https://github.com/bilibili/Index-1.9B/blob/main/README.md