GPT-2 yes OpenAI Launched in 2019, the model once cost $256 per hour to train, so five years later in the GPT-4 era, will advances in hardware, software, and data mean that the time and cost required to train the same model will subsequently decrease? The answer is yes.
As Tom's Hardware reported today, Andrej Karpathy, former Tesla AI director, OpenAI co-founder, and project developer, used llm.c to "recreate" the GPT-2, bringing its cost down to just $28 per hour (currently around Rs. 204), a reduction of nearly 90% in just five years. This is a reduction of nearly 90% in just 5 years.
Image source: Pixabay
The main factor in cost reduction is the use of a single 8XH100 node for training. In addition, Andrej Karpathy says that llm.c implements GPT training directly. "Since llm.c is a direct implementation of GPT training in C / CUDA, its requirements are very low - no conda environment, Python interpreter, pip installation, etc. You just need to start a cloud GPU. You just start a cloud GPU node, optionally install NVIDIA cuDNN, NCCL / MPI, download the .bin data slice, compile and run, and you're up and running in minutes."
He added: "Then wait 24 hours (28*24=672) to generate a sample about 'English-speaking unicorns in the Andes'."
The llm.c project reportedly started out as part of an educational video, but quickly turned into a project that Karpathy built from scratch after running into some PyTorch issues.
However, the report argues that advances in hardware, software, and training data don't mean that the cost of cutting-edge AI training is going down. Anthropic CEO Dario Amodei, for example, recently said that AI models currently in development can cost $1 billion to train, with higher-cost models expected to reach $100 billion by 2025.
Increased hardware performance also comes with increased costs. For example, NVIDIA's H100 chip costs $40,000 per unit, while the next-generation Blackwell AI chip is expected to sell for $70,000 per unit. But even so, the CEO of Google Deepmind has said that the IQ level of the current model is still only equivalent to a cat.