This is a followup to two previous posts on the theory of data center load flexibility:

There are now three publicised real-world implementations that I know of. Only three (and two of those with Google), but it seems likely that the biggest hurdle left for wider deployment is to build confidence with utilities and not technically feasibility.

Of course the future of the AI boom is far from certain. Public sentiment seems to be shifting and reality may finally catching up with some of the hype. But if that happens and demand falls short of the astronomical predictions, having smarter DC load orchestration is still a win that allows optimising workloads against cost/emissions/capacity.

Emerald AI/Oracle/NVIDIA in Phoenix, Arizona

Emerald AI have written about a field demonstration using their proprietary software in collaboration with Oracle and NVIDIA. This is part of EPRIs DCFlex initiative.

Note that despite the format this is very much a self-published white paper and not peer-reviewed. That said, it does a good job laying out the methodology and results

To quote the authors

Our central hypothesis is that GPU driven AI workloads contain enough operational flexibility–when smartly orchestrated–to participate in demand response and grid stabilization programs

The setup:

  • The “cluster” being tested is teeny-tiny, only 256-GPUs, and < 100kW. So one or two racks.
  • Jobs are classified via tags into different flexibility tiers based on their tolerance for runtime or throughput deviations.
  • It’s a software-only solution. Emerald software
    • takes grid signals, job tags and job telemetry as inputs
    • uses this to predict the power-performance behaviour of AI jobs
    • recommends an orchestration strategy to meet both AI SLAs and power grid response commitments
    • implements that strategy throw a combination of controls knobs
      • power capping via dynamic voltage frequency scaling (DVFS)
      • job pausing
      • changing the number GPUs allocated per job
  • Four representative workload ensembles were selected, with varying proportions of training, inference, and fine-tuning jobs.

The system responded to two real-world events, in each case reducing load by 25% for 3hrs (WRT to average base load), with a 15m ramp/up down, without violating SLAs.

They also ran some simulations that went well, but I find these less interesting from a confidence-building point of view.

Google in Indiana and North Carolina

First up, theres a demonstration site in Lenoir where Google is coordinating with the EPRI. It’s part of the same DCFLex program as the above Emerald AI demonstration, and based on their progress I would hope something is happening/will happen soon, but I can’t find anything published.

Second, Google has signed a utility agreement for a demand-response project that “centers on the tech giant’s $2 billion data center in Fort Wayne, Indiana, which started operations late last year but expects to ramp up its power needs over time”.
There’s also very little info on this one. The official Google post is pretty thin, but this Canary Media post pulls together a bit more colour from Tyler Norris (of rethinking_load_growth fame) and Michael Terrell (from Google).

So watch this space I guess.