FM-CGM: Zero-Shot Causal Generative Modeling with Foundation Models

ai-technology · 2026-05-25

A new modular framework named FM-CGM has been developed by researchers, utilizing pretrained foundation models to facilitate end-to-end visual causal reasoning without the need for further training. This system incorporates a substantial reasoning model for causal inference alongside a text-to-image diffusion model for generation, allowing for zero-shot causal discovery, intervention, and counterfactual generation. A significant advancement is the introduction of Causal Semantic Guidance (CSG), a cross-attention mechanism that ensures the proper propagation of semantic interventions. This research addresses the absence of a cohesive framework for merging the zero-shot reasoning abilities of pretrained models with causal generative modeling, which is crucial for creating dependable and transparent AI systems capable of counterfactual reasoning. The paper can be found on arXiv with the identifier 2605.23861.

Key facts

FM-CGM is a modular framework for end-to-end visual causal reasoning using pretrained foundation models.
It formalizes the causal pipeline through three core components: concept extractor, concept manipulator, and counterfactual generator.
The approach enables zero-shot causal discovery, intervention, and counterfactual generation.
It leverages a large reasoning model for causal inference and a text-to-image diffusion model for generation.
Causal Semantic Guidance (CSG) is a cross-attention-based mechanism that ensures semantic interventions propagate correctly.
The paper is published on arXiv with identifier 2605.23861.
Causal generative modeling is essential for developing reliable and transparent AI systems capable of counterfactual reasoning.
Existing approaches often lack a unified framework to leverage zero-shot reasoning capabilities of pretrained foundation models.

FM-CGM: Zero-Shot Causal Generative Modeling with Foundation Models

Key facts

Entities

Institutions

Sources