LLM-Based Symbolic Regression Enhanced by Programmatic Context Augmentation
A new framework for symbolic regression (SR) leverages large language models (LLMs) with programmatic context augmentation to improve discovery of mathematical expressions from data. Traditional SR methods, based on genetic algorithms, suffer from scalability and expressivity issues. Recent LLM-based evolutionary search approaches show promise but rely solely on scalar metrics like mean squared error, ignoring rich dataset information. The proposed method enables code-based interactions with the dataset, allowing active data analysis and extraction of contextual features to guide the search. This approach addresses a key limitation of existing LLM-based SR by incorporating richer feedback beyond simple error metrics. The work is detailed in arXiv preprint 2605.03101.
Key facts
- Symbolic regression discovers mathematical expressions from data
- Traditional methods use genetic algorithms with scalability limits
- LLM-based evolutionary search is a recent approach
- Existing LLM methods rely only on scalar metrics like mean squared error
- New framework adds programmatic context augmentation
- Method enables code-based dataset interactions
- Framework performs active data analysis
- Preprint available on arXiv: 2605.03101
Entities
—