AI Agents Achieve Spontaneous Self-Evolution Without Human Rewards

ai-technology · 2026-04-22

A recent research paper presents a novel technique for training AI agents to evolve autonomously, eliminating the need for external rewards or human oversight. This method incorporates an intrinsic meta-evolution capability, enabling agents to learn about new environments independently before undertaking tasks. Throughout the training process, an outcome-based reward system evaluates how much an agent’s self-acquired knowledge enhances its performance on subsequent tasks. This reward mechanism equips the model with effective exploration and summarization skills. During inference, the agent functions without external incentives or human guidance, relying entirely on its internal parameters to navigate unfamiliar settings. Applied to Qwen3-30B and Seed-OSS-36B models, this approach yielded a 20% improvement in performance, marking a significant shift from traditional agent systems that rely on human-defined rewards. The research tackles the critical limitation of external supervision in modern AI agents, which typically cease to evolve without human intervention.

Key facts

Research trains AI agents for spontaneous self-evolution without external rewards
Agents develop intrinsic meta-evolution capability to learn about unseen environments
Outcome-based reward mechanism measures improvement in downstream task success
Reward signal used only during training phase to teach exploration and summarization
At inference time, agents require no external rewards or human instructions
Method applied to Qwen3-30B and Seed-OSS-36B models
Native evolution approach yields 20% performance improvement
Current agent systems typically depend on human-defined rewards and rules

AI Agents Achieve Spontaneous Self-Evolution Without Human Rewards

Key facts

Entities

Institutions

Sources