ARTFEED — Contemporary Art Intelligence

GAR: Carbon-Aware LLM Routing via Constrained Optimization

ai-technology · 2026-05-13

A new framework called Green-Aware Routing (GAR) minimizes CO2 emissions per LLM inference request while maintaining accuracy and latency targets. GAR uses adaptive constraint optimization and lightweight estimators for real-time routing decisions across heterogeneous model pools. The paper introduces GAR-PD, a practical online primal-dual routing algorithm.

Key facts

  • GAR is a constrained multi-objective optimization framework for LLM inference routing.
  • It minimizes per-request CO2 emissions subject to accuracy floors and p95-latency SLOs.
  • GAR employs per-dataset floor tuning and lightweight estimators for correctness, tail latency, and carbon emissions.
  • GAR-PD is a practical online primal-dual routing algorithm.
  • Current routing methods rarely consider sustainable energy use and CO2 emissions.
  • Grid carbon intensity varies by time and region.
  • Models differ significantly in energy consumption.
  • The paper is published on arXiv with ID 2605.11603.

Entities

Institutions

  • arXiv

Sources