Causal Concept Graphs Enable Stepwise Reasoning in LLMs
A novel approach called Causal Concept Graphs (CCG) has been introduced by researchers to model causal relationships among concepts in large language models, particularly during multi-step reasoning. This method integrates task-conditioned sparse autoencoders for discovering concepts with DAGMA-style differentiable structure learning, resulting in a directed acyclic graph that represents sparse, interpretable latent features. The edges of the graph denote the learned causal connections. To assess the effectiveness of graph-guided interventions, the team developed the Causal Fidelity Score (CFS). In experiments on ARC-Challenge, StrategyQA, and LogiQA using GPT-2 Medium, CCG recorded a CFS of 5.654±0.625, surpassing ROME-style tracing (3.382±0.233), SAE-only ranking (2.479±0.196), and a random baseline (1.032±0.034), with p<0.0001 after Bonferroni correction. The graphs demonstrated a sparse edge density of 5-6%.
Key facts
- Causal Concept Graphs (CCG) model causal dependencies between concepts in LLMs.
- CCG uses task-conditioned sparse autoencoders and DAGMA-style structure learning.
- Causal Fidelity Score (CFS) evaluates intervention effects.
- Tested on ARC-Challenge, StrategyQA, and LogiQA with GPT-2 Medium.
- Five seeds with n=15 paired runs.
- CCG achieves CFS=5.654±0.625.
- Outperforms ROME-style tracing (3.382±0.233), SAE-only ranking (2.479±0.196), and random baseline (1.032±0.034).
- p<0.0001 after Bonferroni correction; graphs are 5-6% edge density.
Entities
—