CausaLab: Interactive Causal Discovery Platform for AI Scientists
CausaLab has been developed by researchers as a scalable platform for assessing interactive causal discovery through LLM agents. This evaluation method differs from previous ones by examining both the use of causal evidence in problem-solving and the validity of the hypothesis regarding the causal mechanism. In each scenario, an agent operates within a synthetic lab, where it analyzes prior measurement data, manipulates a crystal, and forecasts the resonance frequency of a separate reactor crystal influenced by the same mechanism. The underlying data-generating process is derived from a randomly selected structural causal model (SCM), compelling agents to reconstruct both the causal graph and structural equations rather than relying on existing knowledge. CausaLab features a specialized language that tracks the agent's developing SCM hypothesis, allowing for the analysis and comparison of trajectories against the ground truth. Experimental results reveal a consistent performance gap between current LLM agents and optimal outcomes, underscoring the difficulties in causal reasoning.
Key facts
- CausaLab is a scalable environment for evaluating interactive causal discovery by LLM agents.
- It evaluates both problem-solving and correctness of causal mechanism hypotheses.
- Each episode involves predicting resonance frequency of a held-out reactor crystal.
- The hidden data-generating process is a randomly sampled structural causal model (SCM).
- CausaLab includes a domain-specific language for recording SCM hypotheses.
- Experiments show a persistent gap between LLM agents and optimal performance.
- The platform is designed for AI scientists to test causal reasoning capabilities.
- The paper is available on arXiv with ID 2605.26029.
Entities
Institutions
- arXiv