Will AI workloads consume all the world’s energy?

AI stocks: Rise of the machinesSPARKS

Description

⋅ Jun 15, 2023 3:40 PM

On big questions like this, almost nothing stays constant. When we consider a new technology:

We cannot assume that rates of adoption or usage will remain constant—they may drop, they may even grow.
We cannot assume that the technology supplying our energy needs will remain constant—there could be breakthroughs in efficiency or changes in the overall energy mix.
We cannot assume that the efficiency of the specific technology being adopted will remain constant—we have seen numerous examples of areas where an initial version of something in technology or software faces subsequent improvements that may give it greater capabilities with lower energy usage.

We must also recognise that artificial intelligence (AI) itself could suggest improvements in energy efficiency for specific applications—like the heating and cooling of a building. Therefore, any analysis of energy usage and AI must recognise that the one constant will be change.

Environmental impact of select large language models (LLMs)
LLMs have been garnering the lion’s share of attention amidst the current excitement around generative AI. It makes sense to consider the amount of carbon emissions generated by some of these systems. The Stanford AI Index Report, published in 2023, provided some data, noting that factors like the number of parameters in a model, the power usage effectiveness1 of a data centre, and the grid carbon intensity all matter.

Considering power consumption of an LLM
Those building different LLMs have many levers they can pull in order to influence different characteristics, like energy consumption. Google researchers proposed a family of language models named GLaM (Generalist Language Model), which uses a ‘sparsely activated mixture of experts’. While a full discussion of how that type of approach works is beyond the scope of this piece, we note that the largest of the GLaM models has 1.2 trillion parameters. Knowing solely that data point, the assumption would be that this model would consume more energy than any of the models.

In reality, the GLaM model with 1.2 trillion parameters consumes only one-third of the energy required to train GPT-3 and requires only half of the computation flops for inference operations. A simple way to think of what is going on is that, while the total model has 1.2 trillion parameters, a given input token into the GLaM model is only activating a maximum of 95 billion parameters, that is, the entire model isn’t active across all the parameters. GPT-3, on the other hand, activated all 175 billion parameters on each input token3. It is notable that, even if measuring the performance of AI models occurs on many dimensions, by many measures the GLaM model is able to outperform GPT-3 as well4.

Conclusion

The bottom line is that model design matters, and if model designers want to denote ways to maintain performance but use less energy, they have many options.

Sources
1 Power usage effectiveness (PUE) is useful in evaluating the energy efficiency of data centres in a standard way. PUE = (total amount of energy used by a computer data centre facility) / (energy delivered to computer equipment). A higher PUE means that the data centre is less efficient.
2 Source: Du et al. “GLaM: Efficient Scaling of Language Models with Mixture-of-Experts.” ARXIV.org. 1 August 2022.
3 Source: Patterson, David; Gonzalez, Joseph; Hölzle, Urs; Le, Quoc Hung; Liang, Chen; Munguia, Lluis-Miquel; et al. (2022): The Carbon Footprint of Machine Learning Training Will Plateau, Then Shrink. TechRxiv.
4 Source: Du et al, 1 August 2022.

Comments

zAngus

⋅ Jun 16, 2023 2:29 AM

Interesting discussion. I have also been wondering if todays Bitcoin miners might benefit from switching to using their processing power to help power AI applications as the margins on coin mining gets tighter and tighter. The power question was an interesting one.

I asked ChatGPT4 how AI and mining compared power wise and this is what it replied.

ChatGPT 4 >>

The comparison between the energy consumption of large language models (LLMs) like GPT-4 and cryptocurrencies like Bitcoin isn't an apples-to-apples one because they serve different purposes and use energy in different ways.

As of my knowledge cutoff in September 2021, cryptocurrency operations, particularly Bitcoin, are known for their high energy consumption. This is due to the Proof of Work (PoW) consensus algorithm used in Bitcoin mining, where miners need to solve complex mathematical puzzles to add a new block to the blockchain. This process is computationally intensive and, as a result, energy-consuming.

On the other hand, training large language models like GPT-4 indeed requires significant computational resources, and hence energy, but the energy consumption mostly occurs during the initial training phase. Once the model is trained, the energy required for inference (generating responses based on the trained model) is relatively small.

In terms of raw numbers, according to an OpenAI estimate, the training of GPT-3, a predecessor of GPT-4, resulted in roughly 284 metric tons of CO2 emissions, which is equivalent to the lifetime emissions of about five average US cars. This figure includes not only the energy directly used to run the hardware but also the energy for datacenter overheads, hardware manufacturing, etc.

In contrast, a single Bitcoin transaction has been reported to have a carbon footprint comparable to watching over 50,000 hours of YouTube or conducting over 735,000 Visa transactions. The Bitcoin network as a whole was estimated to consume more energy than some countries like the Netherlands or Argentina.

In conclusion, while both activities consume energy, their consumption patterns and scales differ significantly. Cryptocurrencies, particularly those using PoW, are more consistently energy-intensive due to the ongoing computational work of mining, whereas LLMs have a significant one-time energy cost during training but use far less energy for ongoing operations. It's also worth mentioning that efforts are being made in both fields to reduce their environmental impact. As of my last update in 2021, however, the crypto world in aggregate likely consumes significantly more energy than AI training.

It will be interesting to see how AI evolves and its resource demands change over time.

This publication has been chosen for the Editor's Picks and will be featured on the Home Page as well as tradingview.com/ideas/editors-picks/.

Thank you for your valuable contribution to the TradingView community and keep up the good work!

mfilipas

⋅ Jun 24, 2023 5:42 PM

@zAngus Both training and inference will consume an entire GPUs “horsepower”. Just because training is a single large activity and inference is a small activity - inference is performed millions if not billions of times (once for every person’s interaction with AI). The aforementioned claim about AI energy use after the “initial training phase” as being “small” is plain false.

Btw, proof of work Bitcoin is not singularly analogous to visa transactions. You need to take into account the cost of banks operations worldwide, as well as the global cost of war (not to mention purchasing power loss) because governments can “print” endless money to fund unjustifiably long wars. This is a complex and lengthy topic.

arvine11

⋅ Jun 16, 2023 9:19 AM

it's destiny of human being to digitalize finally and consequences everything will be telorated! digitalizing and managing by artifical intelligent is gift of matrix we are living in ... it's too late.

ArmanShabanTrading

⋅ Jun 17, 2023 7:31 AM

Impressive work!

brill301

⋅ Jun 21, 2023 4:06 PM

There is infinite energy.

kakola

⋅ Jun 17, 2023 4:59 AM

We'll just have to ask the world to make more energy.

AMIT-RAJAN

⋅ Jun 16, 2023 1:24 PM

Somehow your post covered all the major factors of AI, it's impressive thanks for sharing a wonderful idea like this 🙏✌️

Gy65Dg80

⋅ Jul 22, 2023 5:54 PM

Potential Energy or Kinetic Energy?

redlinesd

⋅ Jul 1, 2023 11:00 AM