DISC: Decoupling Language from Visual Input in Robot Policies

ai-technology · 2026-05-22

A novel approach known as DISC (Decoupling Instruction from State-Conditioned Control) tackles the issue of observation leakage in robot manipulation policies that rely on language. Traditional methods utilize shared network parameters for processing instructions and visual inputs, enabling the network to form shortcuts from scene to action, thereby circumventing language grounding. In contrast, DISC employs a hypernetwork to derive the complete parameter set for a task-specific visuomotor policy based solely on the instruction. This generated policy does not interact with language directly, ensuring that task-awareness is derived from language and effectively preventing observation leakage. To create coherent high-dimensional policy weights, DISC utilizes a two-stage hypernetwork, incorporating gradient-based optimization in its refinement phase. The paper can be found on arXiv with ID 2605.20856.

Key facts

DISC stands for Decoupling Instruction from State-Conditioned Control.
It addresses observation leakage in language-conditioned manipulation policies.
Standard policies process instructions and observations through shared parameters.
Observation leakage allows networks to learn scene-to-action shortcuts.
DISC uses a hypernetwork to generate policy parameters from instruction alone.
The generated policy never directly accesses language.
A two-stage hypernetwork with gradient-based optimization structure is used.
The paper is on arXiv with ID 2605.20856.

DISC: Decoupling Language from Visual Input in Robot Policies

Key facts

Entities

Institutions

Sources