In 2017, eight machine learning researchers at Google co-authored a groundbreaking research paper entitled Attention Is All You Need. This paper introduced the Transformer AI architecture, which is now the core foundation underpinning nearly all mainstream generative AI models.
The Transformer architecture is one of the key elements that has helped modern AI flourish by using neural networks to compile or transform chunks of input data, called "tokens", into a desired alternative output form. models, including GPT-4 (and ChatGPT) and other language models, audio generation models running Google NotebookLM and OpenAI advanced speech patterns, video generation models such as Sora, and image generation models such as Midjourney.
At the TED AI conference in October, Jakob Uszkoreit, one of the eight sons of Google, was interviewed by the media. In the interview, he shared the history of Transformer, Google's early explorations in the field of large language modeling, and his current new adventures in the field of biocomputing.
In the interview, Uskert revealed that while he and the team at Google had high hopes for the potential of the Transformer technology, they didn't fully foresee it playing such a critically important role in products like ChatGPT.
Below is the full text of the interview:
Ask:What is your main contribution to the paper Attention is All You Need?
Uskert:There is a detailed elaboration in the footnotes of the paper, but my central contribution was to raise the idea that it is possible to use attentional mechanisms, in particular self-attention, to replace the recursive mechanisms (from recurrent neural networks) that were dominant in the sequence transduction model at the time. Such an alternative could be more efficient and therefore more productive.
Ask:Did you know what would happen after your team published that paper? Did you foresee the kind of industry it would create?
Uskert:First of all, I would like to emphasize that our work does not exist in isolation, but stands on the foundation of numerous previous studies. This paper is not an isolated event, but rather the culmination of years of effort by our team and many other researchers. Therefore, to attribute subsequent developments solely to this paper may be a human perspective that favors storytelling, but it is not entirely accurate.
Before that paper was published, my team at Google had been working on attention modeling for years. It's been a long and challenging road, involving a lot of research, not limited to my team, but with many other researchers also plowing the field. We had high hopes that the attention model would be able to advance the entire field on a technical level. However.When we talk about whether it can actually facilitate the creation of a product like ChatGPT, at least on the surface, we don't exactly foresee that happening.I mean, even at the time we published the paper, the big language model and the capabilities it showed had already shocked us.
we have not translated these technologies directly into marketable products.This may be due in part to a conservative approach to developing a large-scale (potentially $10 billion investment) product at the time.While we see the potential of these technologies, we're not entirely convinced that they alone will make a product compelling enough on their own. As for whether we have high hopes for this technology, the answer is yes.
Ask:Since you know about Google's work on big language modeling, how did your team feel when ChatGPT became a huge success in the public eye? Was there a sense of regret that "alas, they did it and we didn't capitalize on it"?
Uskert:Indeed, we had a feeling that "this could have happened". But it wasn't a feeling of "Oh, what a shame they got there first" or anything like that. I'm more inclined to say "Wow, this could have happened a lot sooner.". As for the speed with which people are quickly embracing and applying these new technologies, I am truly amazed, that is truly amazing.
Ask:You'd left Google by then, hadn't you?
Uskert:Yes, I have left my job. In a way, you could say that.Google is not the ideal place for this type of innovative workThat was one of the reasons I decided to leave. I left Google not because I didn't love it there, but because I thought I had to fulfill my vision of starting Inceptive somewhere else.
However, my real motivation was not just to see a great business opportunity, but a sense of moral responsibility to do something that could be done better in an external environment, such as designing more effective medicines that would have a direct and positive impact on people's lives.
Ask:The interesting thing about ChatGPT is that I've used GPT-3 before, so when ChatGPT came along, it wasn't a huge surprise to anyone familiar with the technology.
Uskert:Yes, you're right. If you've used this type of technology before, you can clearly see how it's evolved and make reasonable inferences. When OpenAI was developing the earliest GPT models with Alec Radford and others, we were already discussing these possibilities, even though we weren't in the same company at the time. I'm sure we could all feel the excitement at the time, but it's still something no one really expected for the ChatGPT product to be so widely and quickly embraced.
Ask:I was like, "Oh, this is just GPT-3 with the addition of a chatbot feature that keeps context in the conversation loop." I didn't think it was a breakthrough moment, though it was certainly fascinating.
Uskert: Breakthrough moments can take different forms. It's true that this is not a technical breakthrough, but at this level of competence, the technique shows great utility, and this can certainly be called a breakthrough as well.
At the same time, we need to realize thatThe creativity of our users and the diversity of the ways they use the tools we create is often beyond our expectations.We may not be able to foresee how adept they will be at utilizing these tools and how broad these application scenarios will be.
Often, we can only learn by doing. This is why it is so important to maintain an experimental attitude and a willingness to accept failure. Because in most cases, the attempt will fail. But in some cases, it will succeed, and in rare cases, it will be a huge success like ChatGPT.
Ask:That means taking some risks. Does Google lack the will to take such risks?
Uskert:This was indeed the case at that time. But if you think deeper and look back at history, you'll find that it's actually very interesting. In the case of Google Translate, its experience is actually somewhat similar to ChatGPT. When we debuted the first version of Google Translate, it was at best a joke to play at parties. But in a very short time, we turned it into a really useful tool. In that process, it sometimes output content that was downright awful and embarrassing. Nevertheless, Google persevered because it was a worthwhile attempt in the right direction. But that happened in 2008, 2009, 2010 or so.
Ask:Do you remember Babel Fish, the online translation tool from the AltaVista search engine?
Uskert:Sure.
Ask:When it debuted, my brother and I were often fascinated by it, and we would translate the text back and forth between languages because it made for a confusing and interesting text.
Uskert:Yes, those kinds of translations tend to turn out more and more outrageous and ludicrous.
(Note: After leaving Google, Uskert co-founded Inceptive, a company dedicated to bringing deep learning techniques to the field of biochemistry. The company is developing what Uskert calls "biosoftware," a method of translating specific behaviors into RNA sequences using artificial intelligence compilers. When these RNA sequences are introduced into biological systems, they can perform predefined functions.)
Ask:What's your focus these days?
Uskert:In 2021, I co-founded Inceptive, with the goal of using deep learning and high-throughput biochemistry experiments to design truly programmable, more efficient drugs. We believe that this is just the first step in our "bio-software".
Biological software is in some ways similar to computer software. You first set some behavioral specifications and then use a compiler to translate those specifications into computer software that runs on the computer to demonstrate the functionality you specified. Similarly, in biological software, you define a fragment of a biological program and then compile it using a compiler. The key here, however, is that we are not using a traditional engineering compiler, because the complexity of living systems is far beyond that of computers. However, by introducing an AI compiler with learning capabilities, we are able to compile or convert these biological program fragments into molecules. When these molecules are inserted into a biological system or organism, our cells function as pre-programmed.
Ask:Is this similar to how the mRNA COVID vaccine works?
Uskert:The mRNA COVID vaccine can be seen as an extremely simple example. In this example, the program instructs the cell to "make this modified viral antigen" and the cell then produces the corresponding protein as instructed. However, as you can imagine, molecules can exhibit much more complex behaviors than this. To visualize the complexity of these behaviors, you need only consider RNA viruses. They are simply RNA molecules, but when they invade an organism, they can exhibit incredibly complex behavior. For example, they can be widely distributed in an organism, even spread globally, perform specific tasks in only a few cells of an organism at a specific time, and so on. Thus, you can envision thatWhat a revolution it would be if we could design tiny molecules with these capabilities.Of course, our goal is never to create molecules that make people sick, but to create molecules that are good for human health, which will revolutionize medicine.
Ask:How do you ensure that you don't accidentally create destructive RNA sequences?
Uskert:For a long time, medicine has remained somewhat outside of science.It has not really been thoroughly understood, and we still do not have a full grasp of its actual mechanism of action.
As a result, humans have had to develop a variety of safeguards and clinical trial processes. These experience-based safeguards, which exist long before a patient sets foot in a clinic, can stop us from negligently creating dangerous substances. These systems have been with us since the dawn of modern medicine. Therefore, we will continue to use these systems and do everything we can to ensure safety. We will start our experiments with the smallest systems, use individual cells in future experiments, and strictly follow the established protocols of the medical community to ensure the safety of these molecules.