LLMs Achieve Retrosynthesis via Atom-Anchored Reasoning
A new framework enables general-purpose large language models (LLMs) to perform single-step retrosynthesis without task-specific training. The method anchors chain-of-thought reasoning to molecular structure using unique atomic identifiers. In a zero-shot task, the LLM identifies relevant fragments and chemical labels; an optional few-shot step uses class examples to predict the transformation. This approach overcomes prior LLM underperformance in retrosynthesis, validated on academic benchmarks and expert-validated drug discovery molecules. The work addresses the scarcity of labeled chemical data by leveraging LLMs' reasoning capabilities.
Key facts
- Framework uses unique atomic identifiers to anchor reasoning.
- Operates without task-specific model training.
- Two-step process: zero-shot fragment identification followed by optional few-shot prediction.
- Applied to single-step retrosynthesis, a task where LLMs previously underperformed.
- Tested on academic benchmarks and expert-validated drug discovery molecules.
- Addresses scarcity of labeled data in chemistry.
- Published on arXiv with ID 2510.16590v2.
- Announcement type: replace-cross.
Entities
Institutions
- arXiv