As the initial enthusiasm of the current large-scale model competition fades, manyVCInvestors have entered a cooling-off period regarding big models, and their investment standards have become much more rational and rigorous.
In this case, what are the characteristics of the teams that are favored by star capital at this stage?
For example, the mysterious startup MoonshotAI, which suddenly released its own product some time ago, provided us with a reference.
Before the release of its own big model Kimi Chat, few people would have thought that this AI startup, whose founder was only 31 years old and had not released any products, would receive investment from VCs such as Sequoia China and Zhen Fund, and be selected by The Information as one of the five "China OpenAI" startups.
So, for the domestic big models, is the entry of Dark Side of the Moon just another storytelling hype or a dark horse that has emerged?
1
VCs’ considerations
At this stage, in order to judge the value of an AI startup, in addition to the information publicly displayed by the company, it is also worth learning from the perspective of VCs to analyze the reasons for their investment.
Take Sequoia China, a star capital that invested in Dark Side of the Moon, for example. Currently, Sequoia China has invested in nearly 30 companies in the field of AI, but its real core criteria are only two:
1. There are usage scenarios to solve practical problems;
2. The system can continuously obtain useful data for self-learning to improve processing capabilities.
existFirstIn terms of the criteria, Sequoia China has different insights into the selection of AI companies than most domestic VCs.
Currently, AI investments are mostly concentrated on the B-side, because compared with the C-side, the B-side's industry vertical large models are easier to find application scenarios.
However, Sequoia China believes that a vertical industry background is not a necessary condition, and a deep insight into industry pain points is a more important factor.
For example, the founder of Mobike was not in the bicycle business, but she discovered a real demand and realized that AI could play a valuable role in this process.
Following this line of thought, looking at the situation of the dark side of the moon, we will understand the reasons why Redshirt China invested.
The large model Kimi Chat released in Dark Side of the Moon isThe firstSmart assistant products that support input of 200,000 Chinese characters. This context length is currentlyHighestIt is 2.5 times that of Claude2-100k (about 80,000 words) and 8 times that of GPT-4-32k (about 25,000 words).
What does long text input mean?
Before Kimi Chat was released,A large model landingmaximumThe obstacle, or bottleneck, is the limitation on the input length.
Due to length limitations, existing large models are unable to cope with any scenario that requires long-form analysis or sustained dialogue.
For example, in the legal industry, practitioners sometimes need to deal with large amounts of long texts, such as legal documents, contracts, judgments, cases, etc., while in the media industry, editors or writers also need to analyze and read large amounts of articles, news, and reports.
Admittedly, faced with the limitation of input length, people can circumvent it by taking shortcuts such as “sending in segments”. However, due to the length limitation, after reaching the word limit, the large model still has to restart the analysis of each paragraph.
This constant “starting from scratch” also makes it difficult for large models to form a coherent and in-depth set of insights.
This situation is like a primitive man who has learned to write, but is limited by the carrier of text (it can only be engraved on stone), unable to save more information or accumulate more wisdom, so civilization cannot develop in the long run.
If the large model wants to get rid of this "primitive stage" and expand to a wider range of scenarios, the limitation of text length must be broken.
It is for this reason that Dark Side of the Moon, which grasped the pain point of "length limit", has received so much attention from Red Shirt China.
However, in addition to specific scenarios and technologies, the "human" factor cannot be ignored in the process of large-scale entrepreneurship.
2
The future of technical genius
At present, almost every AI startup wants to be OpenAI, but how many teams have such talent and the soil that allows them to give full play to their talents?
On the surface, in the current big model startup boom, graduation from a prestigious university, experience in a large company, and strong technical genes seem to have become a "standard configuration".
The same is true for Dark Side of the Moon.
Its founder, Zhilin Yang, not only graduated from Tsinghua University and studied at Carnegie Mellon University, but also worked for Google Brain Research Institute and Meta (Facebook) Artificial Intelligence Research Institute. He also collaborated with Turing Award winner Yann LeCun to publish papers.
Similarly, Zhou Xinyu, the second largest shareholder of the team, is also a classmate of Yang Zhilin in the Department of Computer Science and Technology at Tsinghua University;
The third largest shareholder, Wu Yuxin, graduated from Tsinghua University and Carnegie Mellon University, and won the 2018 European Conference on Computer Vision (ECCV)optimalPaper nomination. He is also a member of the FAIR team at Meta (Facebook) Artificial Intelligence Lab.
Judging from the personnel composition, this is a team with a strong technical gene.
However, in the current domestic large-scale model competition, there are many star-like technical talents, but those who have truly made outstanding achievements are still rare.
What is the reason?
From the cases of successful teams such as OpenAI and Midjorney, we can summarize at least two points:
1. The team insists on its own "independence";
2. Whether the founder has broad vision and experience;
aboutFirstAs far as the domestic situation is concerned, although there are many cases of "technical geniuses" starting their own businesses, a considerable number of these teams were eventually acquired or controlled due to lack of equity or economic independence. For example, the first-class technology that was previously acquired by Lightyear Away is such an example.
In comparison, OpenAI and Midjorney have more independent autonomy in financing and equity issues.
As a non-profit organization, OpenAI does not have to always put the will of shareholders first.Firstand David Holz, the founder of Midjorney, relied on his fame and connections to gather the corresponding resources and talents without raising any funds.
All these make it easier for them to stick to their independent research direction.
In this regard, according to the information on the Tianyancha App, Yang Zhilin holds 78.97% of Dark Side of the Moon, which hasabsoluteControl.
In addition to insisting on "independence", the founder's vision and practical experience have also become another major factor in the success or failure of a large model team.
Because although the technical team has a pure passion for research, sometimes such persistence will "go astray" and fall into a dilemma of going astray.
In this regard, Dai Wenyuan, the founder of Fourth Paradigm, is an obvious lesson.
Dai Wenyuan, also a "technical genius", chose a very alternative direction of "decision-making AI" when he founded Fourth Paradigm. However, due to the high customized R&D costs, he suffered a cumulative loss of 4.683 billion yuan in three and a half years and suffered the embarrassment of three failed IPOs.
There are many different paths in the current development of AI, some of which are promising and reliable, while others are "wrong options" that need to be eliminated.
Only by conducting extensive exchanges with top foreign universities, institutions and enterprises, and personally participating in practice, can we make correct and forward-looking judgments.
Back to the Dark Side of the Moon, in terms of vision and practical experience, Yang Zhilin has worked at Google Brain Research Institute and Meta (Facebook) Artificial Intelligence Research Institute, and is the author of Transformer-XL and XLNet.Firstauthor.
Among them, the XLNet model achieved better results than Google BERT in 18 natural language tasks and was one of the popular international cutting-edge models in the NLP field at the time.
Such a broad and cutting-edge resume ensures that Yang Zhilin, as the founder, maintains a level of technical control close to that of international first-line talents.
3
The value of a “partial victory”
Among the information released by Dark Side of the Moon, the most praised one is the introduction ofThe firstMoonshot, a large model that supports inputting 200,000 Chinese characters, and Kimi Chat, an intelligent assistant product equipped with this model. Its text length is 8 times that of GPT-4-32k (about 25,000 characters).
It can be said that this is another "victory" achieved by China in local areas over advanced models such as GPT-4.
Why say "again"?
Because there have been more than one large domestic model that claimed to have "surpassed" GPT-4 in some aspects.
In September, the academic community’s popular open source evaluation list C-Evalup to dateIn the first round of rankings, Yuntian Lifei's large model "Yuntian Book" rankedFirst, while GPT-4 ranked only tenth.
The reason for this strange phenomenon is that some domestic large models have learned some different "test-taking skills" (such as cutting out the answers to the test for training), which has caused such a spectacle.
In fact, from OpenAI's experience,A true “partial victory” in technology should be a victory in a certain field of AI.ceilingA breakthrough rather than a temporary data hero.
This is also the reason why, when GPT-1 was completely defeated by Google's BERT and its evaluation and data were all poor, OpenAI still chose a large model instead of a small model.
After all, although small models perform well in professional tasks, as long as the parameters cannot be improved, stronger intelligence will not emerge.
Similarly, the Moonshot currently launched by Dark Side of the Moon can also be seen as a "ceiling” breakthrough.
Because when it comes to long texts, there is also an "impossible triangle" similar to the length of the text, attention, and computing power.
This is manifested in the fact that the longer the text, the more difficult it is to attract full attention and to fully digest it; due to attention constraints, short texts cannot fully interpret complex information; and processing long texts requires a lot of computing power, which increases costs.
In such an impossible triangle, the computational complexity of the self-attention mechanism will increase quadratically with the increase in context length. For example, when the context increases by 32 times, the computational complexity will actually increase by 1,000 times.
The increase in computing power means that more computing power will have to be consumed, which undoubtedly means higher model deployment costs.
In view of this, only by making breakthroughs in long text technology can people split into N applications based on its general model.
At this stage, the "impossible triangle" dilemma of long texts may not be solved for the time being, but for this reason, breaking through this "ceiling"It is truly meaningful and valuable.
The so-called OpenAI in China may have been born out of these "ceiling"In the conquest.