ARTFEED — Contemporary Art Intelligence

SD-Search: Self-Distillation for Search-Augmented Reasoning

other · 2026-05-20

A new method called SD-Search improves search-augmented reasoning agents by providing step-level supervision without external teachers. The approach uses on-policy hindsight self-distillation, where a single model acts as both student and teacher, differing only in conditioning. This addresses the credit assignment problem in outcome-reward reinforcement learning, where individual queries lack step-specific rewards. SD-Search requires no additional annotations or larger models.

Key facts

  • SD-Search derives step-level supervision from the policy itself through on-policy hindsight self-distillation
  • It requires neither an external teacher nor additional annotations
  • A single model plays two roles: student and teacher
  • The student sees only the context available at inference time
  • The teacher has access to additional information
  • Addresses the credit assignment problem in search-augmented reasoning
  • Improves performance of search-augmented reasoning agents

Entities

Sources