OpenAI secretly tested GPT-4o, and it topped the chatbot arena rankings

OpenAI William Fedus, an employee of LMSYS, confirmed on social media platform X on Monday that ChatbotsThe mysterious chatbot "gpt-chatbot" that performed well in the Chatbot Arena is the new artificial intelligence model they just released. GPT-4oFedus also revealed that GPT-4o topped the Arena leaderboard in the test, achieving the highest score ever.

OpenAI secretly tested GPT-4o, and it topped the chatbot arena rankings

“GPT-4o is our most advanced cutting-edge model,” Fedus wrote on Twitter. “We’ve been testing a version of it in Arena under the name ‘im-also-a-good-gpt2-chatbot’.”

OpenAI secretly tested GPT-4o, and it topped the chatbot arena rankings

Chatbot Arena is a website where visitors can talk to two random AI language models at the same time, without knowing which is which, and then choose the model that provides the better response.

Starting in April this year, OpenAI tested multiple versions of GPT-4o in the arena. The model first appeared under the name "gpt2-chatbot", then became "im-a-good-gpt2-chatbot", and finally "im-also-a-good-gpt2-chatbot".

Since GPT-4o was released today, multiple sources have revealed that the model has topped LMSYS’s internal leaderboard by a huge margin, surpassing the previous top-ranked models Claude 3 Opus and GPT-4 Turbo.

lmsys.org The official account of shared a chart and wrote: "The 'gpt2-chatbot' series model has just soared to the top of the list, surpassing all other models by a significant margin (about 50 Elo), and it has become the most powerful model in the arena. This is an internal screenshot. The public version of 'gpt-4o' has now entered the arena and will soon appear on the public leaderboard!"

OpenAI secretly tested GPT-4o, and it topped the chatbot arena rankings

As of press time, "im-also-a-good-gpt2-chatbot" has an Elo score of 1309, ahead of GPT-4-Turbo-2023-04-09 with 1253 and Claude 3 Opus with 1246. Claude 3 and GPT-4 Turbo had been competing for the top spot on the leaderboard until the three "gpt2-chatbots" showed up and messed things up.

statement:The content is collected from various media platforms such as public websites. If the included content infringes on your rights, please contact us by email and we will deal with it as soon as possible.
Information

To counter GPT-4o, Google launches Astra project: low-latency chat interaction within mobile phone camera

2024-5-15 9:27:41

Information

Tencent's Hunyuan Wenshengtu model is open source: equipped with the first Chinese-English bilingual DiT architecture, free for commercial use

2024-5-15 9:29:53

Search