OpenAI's original co-founder - IlyaSutskever - noted that the training phase of using large amounts of unlabeled data to understand language patterns and structures is nearing its end.
He mentioned that the results of scale-up training have leveled off, meaning that the method of boosting the performance of AI models by adding more data and arithmetic (i.e. Scaling Law) has hit a bottleneck.
Large language models like ChatGPT (LLMs)
The current expansion strategy has reached its limits
According to Ilya Sutskever, simply scaling up by adding more data and computing resources is no longer enough to achieve meaningful progress.
While adding compute power is still a way to boost AI performance, it's no longer possible to achieve significant model improvements by continually piling on the amount of arithmetic and data as it once was, Ilya said.
Large modeling companies need to adopt smarter training techniques and pay more attention to how and what models are trained, rather than just focusing on size.
This shift in approach represents a critical turning point in the development of AI, moving beyond the idea of "bigger is better".
The pre-training phase, where large models are fed with large amounts of unclassified data to recognize patterns and structures, has been the cornerstone of developing robust LLMs.
At this stage, the model learns language representations by digesting a variety of texts - from books and articles to websites and social media posts - so that it can recognize grammar, syntax, and meaning.
This approach has worked well in terms of past development, with LLMs improving performance by simply increasing the amount of data.
However, Ilya Sutskever believes that this approach has now plateaued. The performance gains from adding more data are diminishing and, more importantly, there is a growing realization that the effectiveness of a model depends not only on the amount of data it processes, but also on the quality and structure of the data it is exposed to.
implying that large modelers must rethink their strategies to make further progress in the development of LLMs.In other words, the mountain of further development of the big model can not be crossed, perhaps a "life and death dividing line".
The shift to "smarter" training
More attention should be paid to the finesse of the model
Ilya Sutskever mentioned that researchers now need to consider more advanced methods to refine the learning process, rather than just increasing the size of the dataset.
This includes improving the algorithms used during training, optimizing data management, and introducing more advanced techniques such as reinforcement learning or multimodal training, in which the model is exposed to not only text, but also images, videos, or other forms of data.
Ilya Sutskever's comment that future LLMs will need to "think a little longer" emphasizes another key aspect of progress.
Large models require the ability to perform more complex reasoning over longer periods of time and are increasingly necessary for tasks that require deep understanding, multi-step reasoning, or long-term memory.
As complexity grows, big models must be able to maintain context over longer conversations, perform more complex tasks, and respond to more subtle cues in the data.
For example, LLMs like ChatGPT's current LLMs can generate impressively coherent and contextually relevant responses in a single conversation.
It is clear, however, that there are many difficulties when it comes to context in long-term communication or when dealing with complex logical reasoning tasks.
To overcome this limitation, future models will need to implement better memory mechanisms and more sophisticated processing capabilities to "think" for longer periods of time.
As computing power continues to grow, theAt the forefront of large modeling firms' concernsA gradual move away from simply scaling models to more efficient and contextually intelligent developments. A combination of technological advances involving neural networks, machine learning algorithms, and the way AI systems process and retain information.
With smarter, longer-thinking models in the future, AI can become more adaptive, allowing for more personalized, accurate, and insightful interactions with users.
Of course if the technology has further grounding, AI applications will also realize breakthroughs in a wide range of industries, from healthcare to finance to customer service.
In conclusion.Ilya Sutskever highlights a key moment in AI research:As the pre-training phase of LLMs reaches its limits, future progress hinges on developing smarter training techniques and improving the ability of models to maintain context over longer periods of time.
At the present time, it is more important than ever for large model organizations to make the right choice of where to expand.
The big modeling enterprise must rethink its approach to dealing with model scaling, focusing less on simply adding more data and computational resources and more on refining the training process and developing models capable of deeper, more coherent reasoning.
llya is shocked, what's his SSI?
SSI (Safe Superintelligence) was founded by the trio of Ilya Sutskever, Daniel Levy and Daniel Gross to develop safe AI systems that far exceed human capabilities.Ilya Sutskever has emphasized that their primary product will be about superintelligent safety.
SSI has raised $1 billion in cash in just over three months without releasing any products, valuing the company at $5 billion.
Ilya Sutskever is one of the co-founders of OpenAI, leaving in May 2024.He is one of the most influential technologists in the field of Artificial Intelligence, having studied under Geoffrey Hinton, known as the "Godfather of Artificial Intelligence", and an early proponent of the Extended Hypothesis.
Daniel Gross was the head of AI technology at Apple and a former Y Combinator partner, while Daniel Levy is a former employee of OpenAI.
SSI operates in a regular for-profit structure and now employs about 10 people in Palo Alto, California, and Tel Aviv, Israel.
SSI's investors include top-tier venture capital firms Andreessen Horowitz, Sequoia Capital, DST Global, and SV Angel, as well as NFDG, an investment partnership run by Nat Friedman and SSI CEO Daniel Gross.
Will SSI raise the "research mountain" in the large modeling community?
Of course in this much talked about conversation.Ilya also mentioned that SSI has identified a new area of research that has the potential to change our understanding of AI.
He compared this research area to a mountain, saying that once this mountain is conquered, the "paradigm" of AI will be fundamentally changed. Let's look forward to SSI's future technological breakthroughs, which may bring about a "revolution" in the field of AI.
However, the exact direction and details of the study remain undisclosed.
On the other hand, and also based on Ilya's past work, he has repeatedly made it clear that his and SSI's goal is not only to push the boundaries of AI technology, but also to avoid the ethical and societal risks that can be posed by ensuring that superintelligence is safe. While Scaling Law may have hit a bottleneck, SSI's explorations show that advances in AI are still full of potential and are moving toward greater sophistication and safety.
From the current point of view, the progress of AI is no longer purely technological competition, but more about how to balance the relationship between technological development, security, and commercialization, which is undoubtedly a challenging long-term topic.
With new methods and breakthroughs in new areas, the future of AI may meet us in a whole new way.