Preference Goal Tuning Optimizes Latent Goals for Frozen Policies

other · 2026-05-04

A new framework called Preference Goal Tuning (PGT) is introduced in a paper on arXiv (2412.02125). PGT addresses the sensitivity of goal-conditioned policies to discrete text prompts by reformulating post-training adaptation as a latent control problem. In this approach, the goal embedding acts as a continuous control variable that modulates a frozen policy's behavior without updating its parameters. Instead, only the latent goal is optimized using a trajectory-level preference objective, effectively searching for the optimal conditioning input that maximizes preferred behaviors and suppresses undesirable ones. The method keeps the policy frozen, offering an alternative to standard fine-tuning. The paper evaluates PGT on various tasks, demonstrating its ability to align trajectory distributions with task preferences.

Key facts

Paper titled 'Preference Goal Tuning: Post-Training as Latent Control for Frozen Policies'
Published on arXiv with ID 2412.02125
Announce type is 'replace'
Goal-conditioned policies are sensitive to instruction/prompt choice
PGT formulates post-training adaptation as a latent control problem
Goal embedding serves as a continuous control variable
Policy parameters remain frozen; only latent goal is updated
Uses trajectory-level preference objective for optimization

Preference Goal Tuning Optimizes Latent Goals for Frozen Policies

Key facts

Entities

Institutions

Sources