ARTFEED — Contemporary Art Intelligence

Preference Goal Tuning Optimizes Latent Goals for Frozen Policies

other · 2026-05-04

A new framework called Preference Goal Tuning (PGT) is introduced in a paper on arXiv (2412.02125). PGT addresses the sensitivity of goal-conditioned policies to discrete text prompts by reformulating post-training adaptation as a latent control problem. In this approach, the goal embedding acts as a continuous control variable that modulates a frozen policy's behavior without updating its parameters. Instead, only the latent goal is optimized using a trajectory-level preference objective, effectively searching for the optimal conditioning input that maximizes preferred behaviors and suppresses undesirable ones. The method keeps the policy frozen, offering an alternative to standard fine-tuning. The paper evaluates PGT on various tasks, demonstrating its ability to align trajectory distributions with task preferences.

Key facts

  • Paper titled 'Preference Goal Tuning: Post-Training as Latent Control for Frozen Policies'
  • Published on arXiv with ID 2412.02125
  • Announce type is 'replace'
  • Goal-conditioned policies are sensitive to instruction/prompt choice
  • PGT formulates post-training adaptation as a latent control problem
  • Goal embedding serves as a continuous control variable
  • Policy parameters remain frozen; only latent goal is updated
  • Uses trajectory-level preference objective for optimization

Entities

Institutions

  • arXiv

Sources