OracleTSC: LLM-Based Traffic Signal Control with Reward Hurdle and Uncertainty Regularization
A recent research article presents OracleTSC, a framework designed to enhance the stability of traffic signal control (TSC) using large language models (LLMs) by tackling the issues of sparse and delayed feedback in reinforcement finetuning. Conventional RL-based TSC techniques often function as opaque systems with poor interpretability, whereas LLMs offer natural language reasoning but face challenges with training stability. OracleTSC integrates two key strategies: a reward hurdle mechanism that eliminates weak learning signals by deducting a calibrated threshold from environmental rewards, and uncertainty regularization that boosts the likelihood of the chosen response to promote consistent decision-making across various outputs. Tests on the LibSignal benchmark indicate that OracleTSC significantly enhances traffic efficiency in a compact LLaMA3-8B model. The paper can be found on arXiv under ID 2605.08516.
Key facts
- OracleTSC is a framework for LLM-based traffic signal control.
- It uses a reward hurdle mechanism to filter weak learning signals.
- It applies uncertainty regularization to encourage consistent decisions.
- Traditional RL-based TSC methods are black boxes with limited interpretability.
- LLMs can provide natural language reasoning but reinforcement finetuning is unstable.
- Experiments were conducted on the LibSignal benchmark.
- OracleTSC uses a compact LLaMA3-8B model.
- The paper is published on arXiv with ID 2605.08516.
Entities
Institutions
- arXiv
- LibSignal