As the initial enthusiasm of the current large-scale model competition fades, manyVCInvestors have entered a cooling-off period regarding big models, and their investment standards have become much more rational and rigorous.If this is the case, what are the characteristics of the teams that can gain the favor of star capital at this stage?For example, the mysterious startup Moonshot AI, which suddenly released its own product some time ago, provided us with a reference.Before the release of its own big model Kimi Chat, few people would have thought that this AI startup, whose founder was only 31 years old and had not released any products, would receive investment from VCs such as Sequoia China and Zhen Fund, and be selected by The Information as one of the five "China OpenAI" startups.So, for the domestic big models, is the entry of Dark Side of the Moon just another storytelling hype or a dark horse that suddenly emerges?1VCs’ considerationsAt this stage, in order to judge the value of an AI startup, in addition to the information publicly displayed by the company, it is also worth learning from the perspective of VCs to analyze the reasons for their investment.Take Sequoia China, a star capital that invested in Dark Side of the Moon, for example. Currently, Sequoia China has invested in nearly 30 companies in the field of AI, but its real core criteria are only two:1. There are usage scenarios to solve practical problems;2. The system can continuously obtain useful data for self-learning to improve processing capabilities.On the first criterion, Sequoia China has different insights into the screening of AI companies than most domestic VCs.Currently, AI investments are mostly concentrated on the B-side, because compared with the C-side, the B-side's industry vertical large models are easier to find application scenarios.However, Sequoia China believes that a vertical industry background is not a necessary condition, and a deep insight into industry pain points is a more important factor.For example, the founder of Mobike was not in the bicycle business, but she discovered a real demand and realized that AI could play a valuable role in this process.Following this line of thought, looking at the situation of the dark side of the moon, we will understand the reasons why Redshirt China invested.Kimi Chat, a large model released in Dark Side of the Moon, is the first intelligent assistant product that supports the input of 200,000 Chinese characters. This context length is 2.5 times that of the current highest Claude 2-100k (about 80,000 characters) and 8 times that of GPT-4-32k (about 25,000 characters).What does super long text input mean?Before Kimi Chat was released,One of the biggest obstacles or bottlenecks in the implementation of large models is the limitation of input length.Due to length limitations, existing large models are unable to cope with any scenario that requires long-form analysis or sustained dialogue.For example, in the legal industry, practitioners sometimes need to deal with large amounts of long texts, such as legal documents, contracts, judgments, cases, etc., while in the media industry, editors or writers also need to analyze and read large amounts of articles, news, and reports.Admittedly, faced with the limitation of input length, people can circumvent it by taking shortcuts such as “sending in segments”. However, due to the length limitation, after reaching the word limit, the large model still has to restart the analysis of each paragraph.This constant “starting from scratch” situation also makes it difficult for large models to form a coherent and in-depth set of insights.This situation is like a primitive man who has learned to write, but is limited by the carrier of text (it can only be engraved on stone), unable to save more information or accumulate more wisdom, so civilization cannot develop in the long run.If the large model wants to get rid of this "primitive stage" and expand to a wider range of scenarios, the limitation of text length must be broken.It is for this reason that Dark Side of the Moon, which grasped the pain point of "length limit", has received so much attention from Red Shirt China.However, in addition to specific scenarios and technologies, the "human" factor cannot be ignored in the process of large-scale entrepreneurship.2The future of technical geniusAt present, almost every AI startup wants to be OpenAI, but how many teams have such talent and the environment that allows them to fully utilize their talents?On the surface, in the current big model startup boom, graduation from a prestigious university, experience in a large company, and strong technical genes seem to have become a "standard configuration".The same is true for Dark Side of the Moon.Its founder, Zhilin Yang, not only graduated from Tsinghua University and studied at Carnegie Mellon University, but also worked for Google Brain Research Institute and Meta (Facebook) Artificial Intelligence Research Institute. He also collaborated with Turing Award winner Yann LeCun to publish papers.Image source: Baidu EncyclopediaSimilarly, Zhou Xinyu, the second largest shareholder of the team, is also a classmate of Yang Zhilin in the Department of Computer Science and Technology at Tsinghua University;The third largest shareholder, Wu Yuxin, graduated from Tsinghua University and Carnegie Mellon University, and was nominated for the best paper at the European Conference on Computer Vision (ECCV) in 2018. He is also a member of the FAIR team at Meta (Facebook) Artificial Intelligence Laboratory.Judging from the personnel composition, this is a team with a strong technical gene.However, in the current domestic large-scale model competition, there are many star-like technical talents, but those who have truly made outstanding achievements are still rare.What is the reason?From the cases of successful teams such as OpenAI and Midjorney, we can at least summarize two points:1. The team insists on its own "independence";2. Whether the founder has broad vision and experience;Regarding the first point, as far as the domestic situation is concerned, although there are many cases of "technical geniuses" starting their own businesses, a considerable number of these teams were eventually acquired or controlled due to lack of equity or economic independence. For example, the first-class technology that was previously acquired by Lightyear Away is such an example.In comparison, OpenAI and Midjorney have more independent autonomy in financing and equity issues.As a non-profit organization, OpenAI does not have to always put the will of shareholders first; and Midjorney's founder David Holz, relying on his own fame and connections, has gathered the corresponding resources and talents without raising funds.All these make it easier for them to stick to their independent research direction.In this regard, according to the information from Tianyancha App, Yang Zhilin holds 78.97% of shares in Dark Side of the Moon and has absolute control.In addition to insisting on "independence", the founder's vision and practical experience have also become another major factor in the success or failure of a large model team.Because although the technical team has a pure passion for research, sometimes such persistence will "go astray" and fall into a dilemma of going astray.In this regard, Dai Wenyuan, the founder of Fourth Paradigm, is an obvious lesson.Dai Wenyuan, also a "technical genius", chose a very alternative direction of "decision-making AI" when he founded Fourth Paradigm. However, due to the high customized R&D costs, he suffered a cumulative loss of 4.683 billion yuan in three and a half years and suffered the embarrassment of three failed IPOs.There are many different paths in the current development of AI, some of which are promising and reliable, while others are "wrong options" that need to be eliminated.Only by conducting extensive exchanges with top foreign universities, institutions and enterprises, and personally participating in practice, can we make correct and forward-looking judgments.Back to the dark side of the moon, in terms of vision and practical experience, Yang Zhilin has worked at Google Brain Research Institute and Meta (Facebook) Artificial Intelligence Research Institute, and is the first author of Transformer-XL and XLNet.Among them, the XLNet model achieved better results than Google BERT in 18 natural language tasks and was one of the popular international cutting-edge models in the NLP field at the time.Such a broad and cutting-edge resume ensures that Yang Zhilin, as the founder, maintains a level of technical control close to that of international first-line talents.3The value of a “partial victory”Among the information released by Dark Side of the Moon, the most praised point is the launch of Moonshot, the first large model that supports the input of 200,000 Chinese characters, and Kimi Chat, an intelligent assistant product equipped with this model. Its text length is 8 times that of GPT-4-32k (about 25,000 characters).It can be said that this is another "victory" achieved by China in local areas over advanced models such as GPT-4.Why say "again"?Because there have been more than one large domestic model that claimed to have "surpassed" GPT-4 in some aspects.In September, in the latest ranking of C-Eval, a popular open source evaluation list in academia, Yuntianlifei's large model "Yuntianshu" ranked first, while GPT-4 ranked only tenth.The reason for this strange phenomenon is that some domestic large models have learned some different "test-taking skills" (such as cutting out the answers to the test for training), which has caused such a spectacle.In fact, from OpenAI's experience,A true technological "partial victory" should be a breakthrough in the ceiling of a certain field of AI, rather than a temporary data hero.This is also the reason why, when GPT-1 was completely defeated by Google's BERT and its evaluation and data were all poor, OpenAI still chose a large model instead of a small model.After all, although small models perform well in professional tasks, as long as the parameters cannot be improved, stronger intelligence will not emerge.Similarly, Moonshot, currently launched by Dark Side of the Moon, can also be seen as a breakthrough in a certain "ceiling" of large models.Because when it comes to long texts, there is also an "impossible triangle" similar to the length of the text, attention, and computing power.This is manifested in the fact that the longer the text, the harder it is to attract full attention and to fully digest it; due to attention constraints, short texts cannot fully interpret complex information; and processing long texts requires a lot of computing power, which increases costs.In such an impossible triangle, the computational complexity of the self-attention mechanism will increase quadratically with the increase in context length. For example, when the context increases by 32 times, the computational complexity will actually increase by 1,000 times.The increase in computing power means that more computing power will have to be consumed, which undoubtedly means higher model deployment costs.In view of this, only by making breakthroughs in long text technology can people split into N applications based on its general model.At this stage, the "impossible triangle" dilemma of long texts may not be solved for the time being, but precisely because of this, breaking through such a "ceiling" is truly meaningful and valuable.The so-called Chinese OpenAI may have been born from the process of overcoming these "ceilings".
statement:The content is collected from various media platforms such as public websites. If the included content infringes on your rights, please contact us by email and we will deal with it as soon as possible.