PRTS: A VLA Foundation Model Using Goal-Conditioned Reinforcement Learning
Researchers have created an innovative foundation model known as PRTS, which stands for Primitive Reasoning and Tasking System. This model utilizes Goal-Conditioned Reinforcement Learning for its pretraining, setting it apart from conventional Vision-Language-Action models that rely heavily on supervised behavior cloning. Instead, PRTS views language instructions as goals and uses contrastive reinforcement learning to form a unified embedding space. Within this framework, it calculates the likelihood of reaching a goal from a given state-action by approximating the log-discounted goal occupancy. This approach addresses a significant limitation of existing VLA models, which often overlook the importance of goal orientation and task progression. You can find this research on arXiv under the identifier 2604.27472.
Key facts
- PRTS is a VLA foundation model.
- It uses Goal-Conditioned Reinforcement Learning for pretraining.
- Contrastive reinforcement learning is employed to learn embeddings.
- The inner product of state-action and goal embeddings approximates log-discounted goal occupancy.
- This measures the probability of reaching a language-specified goal from the current state-action.
- Existing VLAs use supervised behavior cloning, which overlooks temporal task progress.
- The paper is on arXiv with ID 2604.27472.
- PRTS stands for Primitive Reasoning and Tasking System.
Entities
Institutions
- arXiv