PRTS: A VLA Foundation Model Using Goal-Conditioned Reinforcement Learning

ai-technology · 2026-05-01

Researchers have created an innovative foundation model known as PRTS, which stands for Primitive Reasoning and Tasking System. This model utilizes Goal-Conditioned Reinforcement Learning for its pretraining, setting it apart from conventional Vision-Language-Action models that rely heavily on supervised behavior cloning. Instead, PRTS views language instructions as goals and uses contrastive reinforcement learning to form a unified embedding space. Within this framework, it calculates the likelihood of reaching a goal from a given state-action by approximating the log-discounted goal occupancy. This approach addresses a significant limitation of existing VLA models, which often overlook the importance of goal orientation and task progression. You can find this research on arXiv under the identifier 2604.27472.

Key facts

PRTS is a VLA foundation model.
It uses Goal-Conditioned Reinforcement Learning for pretraining.
Contrastive reinforcement learning is employed to learn embeddings.
The inner product of state-action and goal embeddings approximates log-discounted goal occupancy.
This measures the probability of reaching a language-specified goal from the current state-action.
Existing VLAs use supervised behavior cloning, which overlooks temporal task progress.
The paper is on arXiv with ID 2604.27472.
PRTS stands for Primitive Reasoning and Tasking System.

PRTS: A VLA Foundation Model Using Goal-Conditioned Reinforcement Learning

Key facts

Entities

Institutions

Sources