February 1, 2012 - Stanford University's Computer Science Department Visiting ProfessorAndrew Ng(Andrew Yan-Tak Ng (安德rewyan-takng), in a post on Platform X yesterday, supported DeepSeek:"China is catching up to the US in generative AI”.
1AI attach the general idea of Wu Enda's view as follows:
- This week's discussion of DeepSeek brought a few obvious trends into sharper focus:
- China is catching up to the U.S. in generative AI, which has profound implications for the AI supply chain.
- The Open Weights model is transforming the base model layer into a commoditized product, providing more opportunities for application developers.
- Scaling is not the only way to drive AI progress. Despite the immense focus and hype on processing power, algorithmic innovation is rapidly reducing training costs.
- About a week ago, China-based DeepSeek released its impressive DeepSeek-R1 model, which performs comparably to OpenAI's o1 in benchmarks. More importantly, DeepSeek-R1 was released as an open weighted model with a generous MIT license. Last week at Davos, many non-tech business leaders asked me about this model. And on Monday, there was a "DeepSeek sell-off" in the stock market: shares of NVIDIA and a number of other U.S. tech companies plummeted. (At the time of writing, some shares have recovered.)
- DeepSeek has made many people realize the following:
- China is in the generative AI spacecatching up with the United StatesWhen the ChatGPT was released in November 2022, the U.S. was well ahead of China in this area. However, over the past two years, China's progressvery fast, many models from China, such as Tongyi Qianqian (which my team has been using for a few months now), Kimi, Shusheng InternVL, and DeepSeek, have significantly narrowed the gap with the U.S., especially in the area of video generation, where China has surpassed the U.S. at some point.
- I am very pleased with the release of DeepSeek-R1 as an open weighting model, along with its technical report which provides a great deal of detail. And in contrast to this, some US companies have been able to get their hands on the model through theHype about hypothetical AI crises like human extinctionto push for regulation in an attempt to squelch open source development.
- Today, open source / open weighting models are a core part of the AI supply chain and many companies will use them. If the U.S. continues to crack down on open source, eventually China will be in this spacehold a dominant position, many businesses will use models that are more in line with Chinese values than American ones.
- Open weight models are accelerating the commoditization of the base model layer. As I mentioned before, the price of big language model tokens is rapidly declining, and open weighting models are exacerbating this trend and giving developers more choices. openAI's o1 costs $60 per million output tokens, while DeepSeek-R1 costs only $2.19, a nearly 30-fold price difference that has raised a lot of eyebrows.
- The business of training basic models and selling APIs is very tough. Many companies are still looking for ways to recoup their huge training costs. Sequoia Capital's article "AI's $600 Billion Question" illustrates this challenge well (though it should be emphasized that I think the base model companies are doing a great job and want them to succeed). In contrast, building apps on top of the base model offers more opportunity for business. Since other companies have spent billions of dollars training these models, you can now use them for a fraction of the cost to develop customer service bots, email summarization tools, AI doctors, law clerk assistants, and more.
- Scaling isn't the only way for AI to progress. Scaling models around scale has become an important topic in driving AI progress. Admittedly, I was one of the proponents of scaling. Many companies have raised huge amounts of money by hyping the topic, claiming that with more capital, they can scale and predictably drive progress. As a result, scaling has become the focus at the expense of more avenues for progress. Due to the U.S. embargo on AI chips, the DeepSeek team has had to build on lower-performing H800 GPU optimized to replace the H100 GPUs, ultimately resulting in a computational cost of model training (excluding research costs) of less than $6 million.
- It remains to be seen whether this will reduce computational demand. Lowering the price per unit of a good sometimes prompts people to spend more total dollars on that good. I think there is almost no upper limit to the demand for intelligence and computing in the long run, so even if it becomes cheaper, I remain optimistic that human demand for intelligence will grow.
- I've seen various interpretations of DeepSeek's progress on X, as if it were a mirror reflecting everyone's different views. I think DeepSeek-R1 involves geopolitical issues that still need to be resolved, and at the same time it presents great opportunities for AI application builders. My team is already conceptualizing new ideas that can only be realized by using an open advanced inference model. Now is a great time to build!
statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.