https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9770383

Abstract—The amount of CO2 emitted per kilowatt-hour on an electricity grid varies by time of day and substantially varies by location due to the types of generation. Networked collections of warehouse scale computers, sometimes called Hyperscale Computing, emit more carbon than needed if operated without regard to these variations in carbon intensity. This paper introduces Google’s system for global Carbon-Intelligent Compute Management (CICM) , which actively minimizes electricity-based carbon footprint and power infrastructure costs by delaying temporally flexible workloads. The core component of the system is a suite of analytical pipelines used to gather the next day’s carbon intensity forecasts, train day-ahead demand prediction models, and use risk-aware optimization to generate the next day’s carbon-aware Virtual Capacity Curves (VCCs) for all datacenter clusters across Google’s fleet. VCCs impose hourly limits on resources available to temporally flexible workloads while preserving overall daily capacity, enabling all such workloads to complete within a day with high probability. Data from Google’s in-production operation shows that VCCs effectively limit hourly capacity when the grid’s energy supply mix is carbon intensive and delay the execution of temporally flexible workloads to “greener” times.

This is the rare paper that describes an actual working productionised system. Personal disclosure: I’ve contributed some code to this system as part of my day job. Specifically the pipelines that are used to generate and record the carbon intensity and efficiency day-ahead predictions.

CICM takes advantage of the fact that while the type or size of individual workloads in a cluster are hard to predict, they are predictable in aggregate with a reasonable degree of accuracy when you get to a large enough scale. You make these predictions for inflexible (can’t wait to run) and flexible (can wait up to 1d to run) workloads, combine them with the day-head local grid carbon intensity forecasts, and now you can time-shift your flexible workloads to minimise your daily carbon emissions.

The mechanism to spread out the inflexible workloads is a “virtual capacity curve”, a per-hour artificially lowered compute limit the cluster scheduler must honour while giving precedence to inflexible workloads.

Apart from the predicted flexible and inflexible compute load and carbon intensity forecasts, VCCs are also a function of output of trained models that map compute load to power load, prediction uncertainties, and scheduler SLOs.

Mechanically, CICM is composed of a series of pipelines: carbon fetching, power model generation, and load forecasting which are all inputs to the optimisation pipeline which outputs the VCCs.

The paper does not provide detailed results across Google’s fleet, but note that “using actual measurements from Google datacenter clusters, we demonstrate a power consumption drop of 1-2% at times with the highest carbon intensity”. If we are generous and assume a 2x factor difference in carbon intensity between high and low hours, that’s 0.5-1% emissions saving which is not exactly earth shattering.

I will note that spatial shifting appears to have much more potential according to On the Limitations of Carbon-Aware Temporal and Spatial Workload Shifting in the Cloud (which I have not read properly), but that also seems like a much harder solution to implement (see Questions below).

Spatiotemporal workload shifting can reduce workloads’ carbon emissions, the practical upper bounds of these carbon reductions are currently limited and far from ideal.” - crucially, temporal carbon intensity varies at most 2x, but spatial up to 43x and spatial is much harder to do.

This paper also points out that

simple scheduling policies often yield most of these reductions, with more sophisticated techniques yielding little additional benefit.

and

The benefit of carbon-aware workload scheduling relative to carbon-agnostic scheduling will decrease as the energy supply becomes greener

Questions

  • Batch workloads are easy because they are temporally and spatially flexible. What about serving workloads, is there recent work on Carbon Aware spacial flexibility within latency and legal limits?
  • What about ML workloads? These are limited to clusters with specific hardware, the hardware is more heterogenous in its performance and power usage, and CPU is longer the relevant resource, instead it’s some GPU equivalent. Power Modeling for Effective Datacenter Planning and Compute Management which this paper uses does calculate power curves for ML accelerators (and storage devices), but these are not used for carbon aware computing.
  • So how do we extend this to spatial flexibility? To minimise emissions over a day, want a set of workload predictions grouped by what clusters can fulfil them (eg: honouring any legal restrictions or HW requirements). Then need create a set of VCC that 1) have enough aggregate capacity to cover all work daily while honoring cluster constraints, 2) with the minimum emissions.