The cost of training AI could soon become too much to bear

Getty Images

Although companies like OpenAI and Google don’t disclose the precise costs of training AI models like GPT-4 and Gemini, it’s clearly a fiendishly expensive business—and the bigger and more capable these so-called frontier models get, the more it costs to train them.

When OpenAI released GPT-3 in 2020, cloud provider Lambda suggested the model—which had 175 million parameters—cost over $4.6 million to train. OpenAI hasn’t disclosed the size of GPT-4, which it released a year ago, but reports range from 1 trillion to 1.8 trillion parameters, and CEO Sam Altman vaguely pegged the training cost at “more than” $100 million. Anthropic CEO Dario Amodei suggested in August that models costing over $1 billion would appear this year and “by 2025, we may have a $10 billion model.”

Is this kind of exponential cost growth realistic? It’s certainly a real trend, say researchers, but whether it can be sustained is another matter.

In 2022, researchers in the U.K., U.S., Germany and Spain found that, since the deep learning field took off in the early 2010s, the amount of computational power needed to train the most capable new models doubled roughly every six months. According to Epoch AI director Jaime Sevilla, who was the lead author on that paper, the trajectory has held since then, with the cost of training roughly tripling each year—the 4X growth in compute requirements is offset by a 1.3X increase in efficiency.

“It’s still a straight line and it keeps pointing up,” says Sevilla.

Here’s Epoch AI’s projection of the hardware cost involved in training the most expensive AI models, through 2030. This excludes AI researchers’ salaries, which are considerable these days. There is also huge uncertainty—as is clear from the vast range in Epoch's estimations—about the exact trajectory. This results from how little is publicly known about the size and costs of models like GPT-4 and the statistical effects of disparate estimates about the exact investment growth rate which compounds the more years out one tries to project. Sevilla describes the median forecast—the line that hits $140 billion by 2030—as “a naive extrapolation based on historical data, rather than an all-things-considered forecast.”

View this interactive chart on Fortune.com

And here come the additional caveats, apart from that uncertainty. The first and most obvious is that, if this trend continues, the cost of training relative to the capabilities that are gained will at some point become too much for any company to bear.

GPT-3 was more accurate in its output than GPT-2, to the point that it was able to power GitHub’s Copilot code-generator. GPT-3.5, boosted by processes that again required additional training and further computing resources, was convincing enough to provide the foundation for ChatGPT’s first release. GPT-4 added multimodality—the ability to also accept images as inputs—to the mix, along with better reasoning abilities and a much better understanding of the context of users’ prompts. It’s not yet clear how great future leaps between versions will prove to be.

Lennart Heim, another co-author of the 2022 paper who leads the compute governance work at the Centre for the Governance of AI, warned last year that the hardware cost of training one new frontier AI model could theoretically surpass the entire gross domestic product of the U.S. around the mid-2030s. “At some point you will hit the limits,” says Heim. “To spend 1% of GDP on a single training run, the capabilities need to be quite good.”

The amount of available training data may be a limiting factor, although Heim points out there could be ways around this, such as training on more kinds of data and showing the models the same data multiple times. Copyright lawsuits could factor into the equation here, if they stop the likes of OpenAI simply hoovering up every piece of data they can find online. But, even if that happens, there’s always the option of synthetic data—for example, video data generated by the Unity or Unreal gaming engines. It's also likely that companies will increasingly license private data for AI training purposes.

Another wildcard lies in the practicality of expanding data centers, and the associated increase in energy and water consumption. Hyperscalers like Microsoft (which is reportedly contemplating a $100 billion data center project with OpenAI called "Stargate") are starting to look at attaching new energy sources like small modular nuclear reactors to their data centers, and work is underway to find less energy-intensive alternatives to existing AI infrastructure. But pushback against major new data centers has been growing everywhere from Ireland to West Virginia in recent years, and the cascading requirements of new AI models are only amplifying that resistance.

And if the industry really does manage to create artificial general intelligence (AGI), with all the impacts that would have on employment and societal power imbalances, a whole new level of political resistance could manifest—though on the other hand, the advent of AGI could bring benefits that are game-changing enough to justify training costs that today seem outlandish.

However, not all AI is about the march towards AGI, with the massively large model sizes that would likely involve. If one looks at specific use cases—such as turning natural-language requests into SQL database queries—then smaller, fine-tuned models can outperform a beast like GPT-4 despite having only single-digit billions of parameters, argues Philipp Schmid, a technical lead at AI firm Hugging Face.

“We believe in the future of course we will have closed-source, big, strong foundation models, but a lot of companies will adopt multiple small models for specific use cases,” Schmid said, adding that an open-source approach—as has been done by players like Meta and Mistral—would also help to address cost issues.

“If we collaboratively build on the work from each other, we can reuse resources and money spent,” he said.

This story was originally featured on Fortune.com

Advertisement