MechaRule: Grounding LLM Rule Extraction in Neural Circuits

ai-technology · 2026-05-07

Researchers propose MechaRule, a pipeline that extracts symbolic rules from large language models by grounding them in specific neurons. The method identifies 'agonist' neurons whose activation neutralization disrupts rule-related behaviors. It leverages the observation that sparse agonist effects are approximately monotone and saturating, enabling efficient localization without hand-crafted hypotheses. The approach bridges global rule extraction and mechanistic interpretability.

Key facts

MechaRule is a pipeline for rule extraction from LLMs grounded in neural circuits.
It identifies sparse neurons called agonists whose neutralization disrupts rule-related behaviors.
The method is based on empirical observations of monotone and saturating agonist effects.
It avoids hand-crafted hypotheses and expensive neuron-level interventions.
The approach combines global rule extraction with mechanistic interpretability.
The research is published on arXiv with ID 2605.03058.
The paper is categorized under explainable AI (XAI).
The method uses contrastive hierarchical ablation for neuron localization.

MechaRule: Grounding LLM Rule Extraction in Neural Circuits

Key facts

Entities

Institutions

Sources