https://www.computer.org/csdl/magazine/mi/5555/01/11097303/28IR6YdAq9W (with a somewhat extended version here that goes deeper into the methodology)

Abstract: Specialized hardware accelerators aid the rapid advancement of artificial intelligence (AI), and their efficiency impacts AI’s environmental sustainability. This study presents the first publication of a comprehensive AI accelerator life-cycle assessment (LCA) of greenhouse gas emissions, including the first publication of manufacturing emissions of an AI accelerator. Our analysis of five Tensor Processing Units (TPUs) encompasses all stages of the hardware lifespan—from material extraction and manufacturing, to energy consumption during training and serving of AI models. Using first-party data, it offers the most comprehensive evaluation of AI hardware’s environmental impact. We introduce a new metric, compute carbon intensity (CCI), that will help evaluate AI hardware sustainability and estimate the carbon footprint of training and inference. We show that CCI improves 3x from TPU v4i to TPU v6e. Moreover, while this paper’s focus is on hardware, software advancements leverage and amplify these gains.

The main point: Google’s AI hardware has gotten 3x more emissions-efficient over the last four years, where emissions-efficient refers to CO2-equivalent emissions per floating-point-operation (FLOP).

What’s going on in this table?

First, TPUs are broken down by type and generation. The Versatile type being good for training and inference while the powerful type is better for just training.

The authors define compute carbon intensity (CCI) as , or in words: grammes of CO2-equivalent emissions per exaFLOP (floating-point-operation). The paper later proposes using this a standard metric for future LCAs.
I’ll note that this seems like a specific implementation of SCI from the Green Software Foundation.

What about “market/location-based”? These are two different methods of accounting for electricity generation emissions factors (aka: carbon-intensity, CO2-equivalent emissions per unit of power), and I was happy to see them both included:

  • location-based: You use the annual average electricity emissions factor of the local grid. This isn’t perfect, ideally we’d all use hourly-matching.
  • market-based: You get to use the emissions factors from your Carbon Free Energy (CFE) procurements which could be very far away from the generation that you’re actually consuming (eg: all of the USA+Canada are considered in the same location). This is a controversial accounting method, and it’s certainly easy to see how it might not lead to actual emissions reductions.

As we can see from the nice diagram at the top, embodied CCI accounts for DC construction, hardware construction, disposal and recycling, while operational CCI includes power to run the hardware as well as DC cooling. Operational emissions dominate, and “ignoring CFE procurement, embodied emissions are roughly ∼10% and operational emissions are ∼90% of an AI system’s lifetime emissions”.

One last thing that caught my attention was from the methodology: newer chip generations tend to see higher utilisation, and power-efficiency goes up with utilisation (see here). So in order to avoid modelling artificially higher efficiency for newer generations, the authors performed Propensity Score Weighting (see appendix F in the extended paper) to normalise utilisation across generations. For example, if most v4 machines run at low utilisation, then a v4 machine running at high utilisation gets a high weight applied to its utilisation value. This isolates hardware improvements from the utilisation differences.

If I have one quibble with this paper it’s that it probably underestimates the embodied CCI for a combination of two reasons:

  1. Utilisation of a specific generation of hardware seems likely to go down over time as newer hardware comes along. In fact this is the stated reason for implementing propensity weighting.
  2. Hardware may end up with less than the expected six year lifespan due to the rapid pace of change in all things AI.

Both of these will have the effect of reducing the total FLOPs performed by a chip in its lifetime which would raise the embodied CCI. With that said, we’ve already noted that operational CCI is much larger than embodied CCI so maybe it’s not that important.