ARTFEED — Contemporary Art Intelligence

PRTS: A VLA Foundation Model Using Goal-Conditioned Reinforcement Learning

ai-technology · 2026-05-01

Researchers have created an innovative foundation model known as PRTS, which stands for Primitive Reasoning and Tasking System. This model utilizes Goal-Conditioned Reinforcement Learning for its pretraining, setting it apart from conventional Vision-Language-Action models that rely heavily on supervised behavior cloning. Instead, PRTS views language instructions as goals and uses contrastive reinforcement learning to form a unified embedding space. Within this framework, it calculates the likelihood of reaching a goal from a given state-action by approximating the log-discounted goal occupancy. This approach addresses a significant limitation of existing VLA models, which often overlook the importance of goal orientation and task progression. You can find this research on arXiv under the identifier 2604.27472.

Key facts

  • PRTS is a VLA foundation model.
  • It uses Goal-Conditioned Reinforcement Learning for pretraining.
  • Contrastive reinforcement learning is employed to learn embeddings.
  • The inner product of state-action and goal embeddings approximates log-discounted goal occupancy.
  • This measures the probability of reaching a language-specified goal from the current state-action.
  • Existing VLAs use supervised behavior cloning, which overlooks temporal task progress.
  • The paper is on arXiv with ID 2604.27472.
  • PRTS stands for Primitive Reasoning and Tasking System.

Entities

Institutions

  • arXiv

Sources