The University of Washington promotes efficient large model tuning method "proxy tuning"

University of WashingtonIntroducing more efficientLarge ModelA tuning method called "proxy tuning" is used to guide the predictions of the base model by comparing the predictions of a small tuned model with the predictions of an untuned model, thereby tuning the model without touching the model's internal weights.

With the development of generative AI products such as ChatGPT, the parameters of the basic model continue to increase, so weight tuning requires a lot of time and computing power. To improve the tuning efficiency, this method can better retain the training knowledge during decoding while retaining the advantages of larger-scale pre-training. The researchers fine-tuned the 13B and 70B original models of LlAMA-2, and the results showed that the performance of the proxy tuning was higher than that of the directly tuned model.

Paper address: https://arxiv.org/pdf/2401.08565.pdf

This method requires preparing a small pre-trained language model M-, which shares the same vocabulary with the base model M, and then uses the training data to tune M- to obtain the tuned model M+.

During decoding, the prediction of the base model is guided by comparing the difference between the output prediction distribution of the base model M and the output prediction distribution of the tuning model M+. Finally, the prediction difference is applied to the prediction result of the base model to guide the prediction of the base model to move towards the prediction direction of the tuning model. This method is exactly the opposite of the "distillation" technology in large models and is an innovative tuning method.

The introduction of the proxy tuning method provides a more efficient solution for tuning large models, and can also better retain training knowledge during decoding, making the model perform better. The introduction of this method will bring new insights to the development of the AI field and is worthy of further in-depth research and application.

statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.

The University of Washington promotes efficient large model tuning method "proxy tuning"

Google cancels contract with AI data company Appen, which helped train products like Bard

Samsung S24 mobile phone equipped with AI model: AI large model is increasingly widely used

AI Weibo

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai tiktok

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

1ai WeChat

Five minutes a day

Become a master in one year

Scan the QR code to follow

Related content:

Google cancels contract with AI data company Appen, which helped train products like Bard

Samsung S24 mobile phone equipped with AI model: AI large model is increasingly widely used

Zhou Hongyi shares ten predictions on the development trend of big models in 2024: killer applications will emerge

The pioneer of large models! InstructGPT has been released for two years

Ideal Auto Mind GPT multimodal cognitive model passed national registration

Zhipu released and open-sourced the fourth generation of CodeGeeX, a large model for code generation, claiming to have the best performance for scales below 10 billion

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

Five minutes a day

Become a master in one year

Scan the QR code to follow