Googleup to dateResearch reveals an attackLarge Language ModelsAccording to Google’s statement, they not only restored the entire projection matrix of the OpenAI large model, but also obtained the exact size of the hidden dimension, all with less than 2,000 clever tricks.APIThe cost of inquiry is as low as 150 yuan.
The core target of the attack is the model's embedding projection layer, which is the last layer of the model and is responsible for mapping the hidden dimension to the logits vector. By issuing targeted queries to the model's API, the model's embedding dimension or final weight matrix can be extracted. Google successfully identified the model's hidden dimension through a large number of queries and singular value sorting.
This attack method can not only reveal the hidden dimensions of the model, but also obtain global information such as the "width" (total number of parameters) of the model, reduce the "black box degree" of the model, and "pave the way" for subsequent attacks. The research team said that this attack is very efficient, and it only costs less than $20 and about $200 to attack OpenAI's Ada and Babbage models and GPT-3.5, respectively.
OpenAI has learned of this and confirmed the effectiveness of the attack after obtaining the consent of the research team, and finally deleted all the data related to the attack. Although this attack method does not obtain much information, its low cost and high efficiency are shocking.
The defense measures mentioned in the paper include starting from the API, completely deleting the logit bias parameter, or directly starting from the model architecture, modifying the hidden dimension of the last layer after training is completed. After this incident was exposed, OpenAI has taken measures to modify the model API to prevent similar attacks from happening again.
This research reveals that even large language models may be vulnerable to security threats, even if OpenAI has taken certain defensive measures. This incident reminds people that ensuring the security of models remains a complex and important issue.
Paper link: https://arxiv.org/abs/2403.06634