CEDAR: LLM Agent Automates Data Science via Context Engineering

other · 2026-04-24

CEDAR is an application that automates data science tasks using an agentic setup with large language models. It addresses challenges like task complexity, data size, computational limits, and context restrictions through effective context engineering. The system structures initial prompts with DS-specific input fields, then generates an enumerated sequence of interleaved plan and code blocks via separate LLM agents. Function calls keep data local, injecting only aggregate statistics into prompts. The approach enhances fault tolerance and context management.

Key facts

CEDAR automates data science tasks with an agentic LLM setup.
It uses context engineering to overcome task complexity, data size, and context restrictions.
Initial prompts are structured with DS-specific input fields.
Solution is an enumerated sequence of interleaved plan and code blocks.
Separate LLM agents generate plan and code blocks.
Data stays local; only aggregate statistics are injected into prompts.
The system improves fault tolerance and context handling.
The paper is on arXiv with ID 2601.06606.

CEDAR: LLM Agent Automates Data Science via Context Engineering

Key facts

Entities

Institutions

Sources