In the latest issue of this month,nature-Human Behavior" journal, published an article about AIResearch Papers, which mentioned that in the task of testing the ability to track the mental state of others, two types of AI Large Language ModelIn certain situations, they can perform similarly to or even better than humans.
Image source: Pixabay
As the key to human communication and empathy, mental state ability (also known as theory of mind) is very important for human social interaction. The first author of the paper, James WA Strachan of the University Medical Center Hamburg-Eppendorf in Germany, together with colleagues and collaborators, selected tasks that can test different aspects of theory of mind.Including detecting misconceptions, understanding indirect speech, and identifying faux pas.
The team selected the GPT and LLaMA2 models for experiments and compared them with 1,907 people.
The results show that the GPT model is good at identifying indirect requests, false ideas, and misleadingAble to reach and sometimes even exceed the average human level, while the performance of LLaMA2Below human level; LLaMA2 is better than humans at identifying faux pas, but GPT performs poorly.
According to China News Service, the author said that LLaMA2's success was shown to be due to a low "level of bias" in responses, not a true sensitivity to faux pas, and that GPT's "poor performance" was due to an "ultra-conservative" attitude toward sticking to conclusions, rather than errors in reasoning.