ARTFEED — Contemporary Art Intelligence

Belief-Aware GSAC Improves Autonomous Driving Under Partial Observability

ai-technology · 2026-05-27

A recent paper published on arXiv (2605.26155) presents Belief-Aware Guided Soft Actor-Critic (BA-GSAC), a technique that dynamically adjusts knowledge distillation from a comprehensive full-state teacher to a student with limited observations in autonomous driving scenarios. In contrast to traditional GSAC, which employs a constant distillation coefficient lambda, BA-GSAC modifies lambda according to ensemble disagreement, providing a framework to explore the effectiveness of adaptive guidance. Tests conducted on Highway-Env at three POMDP difficulty levels evaluated five approaches: fixed lambda (0.01, 0.1), adaptive, linear decay, and standard SAC. Initial single-seed results indicate advantages in mild and moderate partial observability, but under severe occlusion (assessed with 3 seeds for all methods), the adaptive coefficient drops to lambda_min within roughly 3,000 steps, due to an observability blindness effect.

Key facts

  • BA-GSAC modulates distillation coefficient lambda via ensemble disagreement.
  • Five strategies tested: fixed lambda (0.01, 0.1), adaptive, linear decay, vanilla SAC.
  • Experiments conducted on Highway-Env across three POMDP difficulty levels.
  • Under severe occlusion, adaptive coefficient collapses to lambda_min within ~3K steps.
  • Observability blindness: ensemble predicts partial observations, leading to low disagreement despite uncertainty.
  • Preliminary single-seed runs show benefits under mild and moderate partial observability.
  • Study uses 3 seeds for all methods under severe occlusion.

Entities

Institutions

  • arXiv

Sources