Causal Concept Graphs Enable Stepwise Reasoning in LLMs

ai-technology · 2026-04-27

A novel approach called Causal Concept Graphs (CCG) has been introduced by researchers to model causal relationships among concepts in large language models, particularly during multi-step reasoning. This method integrates task-conditioned sparse autoencoders for discovering concepts with DAGMA-style differentiable structure learning, resulting in a directed acyclic graph that represents sparse, interpretable latent features. The edges of the graph denote the learned causal connections. To assess the effectiveness of graph-guided interventions, the team developed the Causal Fidelity Score (CFS). In experiments on ARC-Challenge, StrategyQA, and LogiQA using GPT-2 Medium, CCG recorded a CFS of 5.654±0.625, surpassing ROME-style tracing (3.382±0.233), SAE-only ranking (2.479±0.196), and a random baseline (1.032±0.034), with p<0.0001 after Bonferroni correction. The graphs demonstrated a sparse edge density of 5-6%.

Key facts

Causal Concept Graphs (CCG) model causal dependencies between concepts in LLMs.
CCG uses task-conditioned sparse autoencoders and DAGMA-style structure learning.
Causal Fidelity Score (CFS) evaluates intervention effects.
Tested on ARC-Challenge, StrategyQA, and LogiQA with GPT-2 Medium.
Five seeds with n=15 paired runs.
CCG achieves CFS=5.654±0.625.
Outperforms ROME-style tracing (3.382±0.233), SAE-only ranking (2.479±0.196), and random baseline (1.032±0.034).
p<0.0001 after Bonferroni correction; graphs are 5-6% edge density.

Entities

—

Sources

arXiv cs.AI — 2026-04-27