ARTFEED — Contemporary Art Intelligence

Persona-Traceable Shared RL Policy for Scalable Game NPCs

ai-technology · 2026-05-25

The innovative reinforcement learning technique known as pcsp (Persona Conditioned Shared Policy) facilitates scalable and consistent behavior for NPCs in life simulation games. In testing against a benchmark of 300 personas, pcsp demonstrates compositional zero-shot persona identification that is up to 17 times better than random chance, achieves a Spearman rho of approximately 0.73 for semantic-behavioral alignment, and offers inference speeds 22 times quicker than a baseline using LLM as policy. This approach employs a unified policy reliant on frozen LLM embeddings derived from free-form persona descriptions, integrating one-time encoding per NPC, low-rank projection, neural conditioning, and a training objective combining PPO, InfoNCE, and KL diversity. It effectively overcomes the limitations of existing methods regarding persona consistency, controllability, and real-time inference.

Key facts

  • pcsp achieves 17x above chance persona identification
  • Spearman rho ≈ 0.73 semantic-behavioral alignment
  • 22x faster inference than LLM-as-policy baseline
  • Single RL policy conditioned on frozen LLM embeddings
  • Uses PPO + InfoNCE + KL diversity training objective
  • Tested on 300-persona life-simulation benchmark
  • Addresses persona consistency, controllability, real-time inference
  • Combines once-per-NPC encoding, low-rank projection, neural conditioning

Entities

Sources