Sora means AGI will be achieved in 10 years

 

Sora means AGI will be achieved in 10 years
A few years ago, I shared the top ten big model trend predictions in the Fengma Niu speech. I didn’t expect that before the end of the year, several of them have been verified, from Gemini, Nvidia’s Chat With RTX to the release of OpenAl.Sora, everyone thinks it's very explosive. My friend asked me what I think of Sora, I talked about a few points, in general, I think AGIIt will be realized soon, just in the next few years:
First, the ultimate competition in technology is about talent density and deep accumulation. Many people say that Sora's effect is much better than Pika and
Runway. This is very cool, compared with the entrepreneurial team,
Companies with core technologies like OpenAl are still very strong. Some people think that with AI, startups only need to be self-employed, but today it has been proven again that this idea is ridiculous.
Second, AI may not subvert all industries so quickly, but it can inspire more people's creativity. Many people today talked about Sora's impact on the film and television industry, but I don't think so, because machines can produce a good video, but the theme, script, shot planning, and dialogue coordination of the video all require human creativity, or at least human prompts. A video or movie is made up of countless 60-second segments. Today, Sora may bring huge disruption to the advertising industry, movie trailers, and short video industries, but it may not defeat TikTok so quickly, and it is more likely to become TikTok's creation tool.

Third, I have always said that the development level of domestic large models is close to GPT-3.5 on the surface, but in fact there is still a gap of one and a half years compared with 4.0. And I believe OpenAI should still have some secret weapons in hand, whether it is GPT-5, or machine self-learning to automatically generate content, including AIGC. Altman is a marketing master who knows how to control the rhythm. They have not taken out all the weapons in their hands. It seems that the gap between China and the United States in AI may still be widening.
Fourth, the most amazing thing about the big language model is that it is not a fill-in-the-blank machine, but can fully understand the knowledge of the world. This time, many people analyzed Sora from the technical and product experience perspectives, emphasizing that it can output 60-second videos, maintain the consistency of multiple shots, and simulate the natural world and physical laws. In fact, these are just superficial.
The most important thing is that Sora's technical approach is completely different.
Before this, we used Diffusion to make videos and pictures. You can think of videos as a combination of multiple real pictures. It does not really grasp the knowledge of this world. Now all the pictures and videos are operations on graphic elements on the 2D plane, and the laws of physics are not applied. But in the videos produced by Sora, it can understand like a human that tanks have huge impact force, and tanks can crash into cars, but there will be no situation where cars crash into tanks. So I understand that this time OpenAl took advantage of its large language model and combined LLM and Diffusion for training, allowing Sora to achieve two levels of ability: understanding the real world and simulating the world. Only in this way can the videos produced be real and can jump out of the 2D range to simulate the real physical world. This is all thanks to the big model.
This also represents the future direction. With a strong large model as the foundation
Based on the understanding of human language, human knowledge and world models, and the addition of many other technologies, we can create super tools in various fields, such as biomedicine, protein and gene research, including physics, chemistry, and mathematics. Large models will play a role. This time, Sora's simulation of the physical world will at least have a huge impact on robot embodied intelligence and autonomous driving. The original autonomous driving technology overemphasized the perception level and did not work at the cognitive level. In fact, when people drive a car, many judgments are based on their understanding of the world. For example, what is the speed of the other party, whether a collision will occur, and how serious the collision is. If you don't understand the world, it is difficult to make a real unmanned driving.
So this time Sora is just trying out its capabilities. It demonstrates more than just the ability to make a video. It also shows that after the big model understands and simulates the real world, it will bring new results and breakthroughs.
Fifth, Open AI should read a lot of videos to train this model. The large model plus Diffusion technology requires a further understanding of the world, and the learning samples will be mainly based on videos and images captured by cameras. Once artificial intelligence is connected to the camera, it will watch all the movies, all the videos on YouTube and TikTok, and its understanding of the world will far exceed that of text learning. A picture is worth a thousand words, and the amount of information conveyed by a video far exceeds that of a picture. This is close to
AGI is really not far away. It is not a matter of 10 or 20 years. It may be achieved in one or two years.

statement:The content is collected from various media platforms such as public websites. If the included content infringes on your rights, please contact us by email and we will deal with it as soon as possible.
Information

Can ordinary users use OpenAI's video generation tool Sora?

2024-2-17 9:05:35

Information

Google accelerates fixes for AI assistant Gemini, cuts rejection rate in half

2024-2-18 7:50:37

Search