OpenAI William Fedus, an employee of LMSYS, confirmed on social media platform X on Monday that ChatbotsThe mysterious chatbot "gpt-chatbot" that performed well in the Chatbot Arena is the new artificial intelligence model they just released. GPT-4oFedus also revealed that GPT-4o topped the Arena leaderboard in the test, achieving the highest score ever.
“GPT-4o is our most advanced cutting-edge model,” Fedus wrote on Twitter. “We’ve been testing a version of it in Arena under the name ‘im-also-a-good-gpt2-chatbot’.”
Chatbot Arena is a website where visitors can talk to two random AI language models at the same time, without knowing which is which, and then choose the model that provides the better response.
Starting in April this year, OpenAI tested multiple versions of GPT-4o in the arena. The model first appeared under the name "gpt2-chatbot", then became "im-a-good-gpt2-chatbot", and finally "im-also-a-good-gpt2-chatbot".
Since GPT-4o was released today, multiple sources have revealed that the model has topped LMSYS’s internal leaderboard by a huge margin, surpassing the previous top-ranked models Claude 3 Opus and GPT-4 Turbo.
lmsys.org The official account of shared a chart and wrote: "The 'gpt2-chatbot' series model has just soared to the top of the list, surpassing all other models by a significant margin (about 50 Elo), and it has become the most powerful model in the arena. This is an internal screenshot. The public version of 'gpt-4o' has now entered the arena and will soon appear on the public leaderboard!"
As of press time, "im-also-a-good-gpt2-chatbot" has an Elo score of 1309, ahead of GPT-4-Turbo-2023-04-09 with 1253 and Claude 3 Opus with 1246. Claude 3 and GPT-4 Turbo had been competing for the top spot on the leaderboard until the three "gpt2-chatbots" showed up and messed things up.