ARTFEED — Contemporary Art Intelligence

A* Post-Training Boosts LLM Reasoning Efficiency

ai-technology · 2026-05-26

A recent study published on arXiv (2605.24597) suggests enhancing deductive reasoning in large language models (LLMs) through A* search. The researchers conceptualize natural language inference as a search challenge, where a valid proof serves as the ultimate answer, necessitating accurate intermediate steps. They investigate supervised fine-tuning based on A* execution traces alongside reinforcement learning utilizing A*-informed reward models. Tests conducted on Llama-3.2 models (1B–3B parameters) demonstrate significant improvements, progressing from nearly zero accuracy to surpassing DeepSeek-V3.2, a considerably larger model. Their findings indicate a balance between straightforward correctness rewards and operational efficiency.

Key facts

  • Paper arXiv:2605.24597 proposes A* post-training for LLM reasoning.
  • Frames natural language inference as a search problem for valid proofs.
  • Uses supervised fine-tuning on A* execution traces.
  • Also uses reinforcement learning with A*-informed process reward models.
  • Llama-3.2 models (1B–3B) improved from near-zero accuracy.
  • Outperformed DeepSeek-V3.2, a much larger model.
  • Trade-off between correctness rewards and efficiency identified.
  • A* search guarantees optimally efficient path to goal.

Entities

Institutions

  • arXiv
  • DeepSeek

Sources