FPILOT: Inference-Time Optimization for RL Trading Agents

ai-technology · 2026-05-14

A new framework called FPILOT (Financial Plugin Inference-time Learning for Optimal Trading) enhances reinforcement learning agents for portfolio management by enabling inference-time optimization using price forecasts. Inspired by Model Predictive Control, FPILOT uses a predictive model to generate multi-step price trajectories without requiring iterative action-conditioned rollouts. At each decision step, the framework optimizes the policy based on an imagined return objective derived from predicted prices, then executes one trade step. It is compatible with any pre-trained agent and adapts to changing forecasts.

Key facts

FPILOT stands for Financial Plugin Inference-time Learning for Optimal Trading
It is a plugin inference-time optimization framework
Inspired by Model Predictive Control (MPC)
Uses a predictive model for multi-step price trajectory
Optimizes policy at inference-time before each trade step
Compatible with any pre-trained agent
Adapts policy to forecasted prices
No iterative action-conditioned rollouts needed

Entities

—

Sources

arXiv cs.AI — 2026-05-14