LLM-Based Symbolic Regression Enhanced by Programmatic Context Augmentation

ai-technology · 2026-05-07

A new framework for symbolic regression (SR) leverages large language models (LLMs) with programmatic context augmentation to improve discovery of mathematical expressions from data. Traditional SR methods, based on genetic algorithms, suffer from scalability and expressivity issues. Recent LLM-based evolutionary search approaches show promise but rely solely on scalar metrics like mean squared error, ignoring rich dataset information. The proposed method enables code-based interactions with the dataset, allowing active data analysis and extraction of contextual features to guide the search. This approach addresses a key limitation of existing LLM-based SR by incorporating richer feedback beyond simple error metrics. The work is detailed in arXiv preprint 2605.03101.

Key facts

Symbolic regression discovers mathematical expressions from data
Traditional methods use genetic algorithms with scalability limits
LLM-based evolutionary search is a recent approach
Existing LLM methods rely only on scalar metrics like mean squared error
New framework adds programmatic context augmentation
Method enables code-based dataset interactions
Framework performs active data analysis
Preprint available on arXiv: 2605.03101

Entities

—

Sources

arXiv cs.AI — 2026-05-06