ARTFEED — Contemporary Art Intelligence

MechaRule: Grounding LLM Rule Extraction in Neural Circuits

ai-technology · 2026-05-07

Researchers propose MechaRule, a pipeline that extracts symbolic rules from large language models by grounding them in specific neurons. The method identifies 'agonist' neurons whose activation neutralization disrupts rule-related behaviors. It leverages the observation that sparse agonist effects are approximately monotone and saturating, enabling efficient localization without hand-crafted hypotheses. The approach bridges global rule extraction and mechanistic interpretability.

Key facts

  • MechaRule is a pipeline for rule extraction from LLMs grounded in neural circuits.
  • It identifies sparse neurons called agonists whose neutralization disrupts rule-related behaviors.
  • The method is based on empirical observations of monotone and saturating agonist effects.
  • It avoids hand-crafted hypotheses and expensive neuron-level interventions.
  • The approach combines global rule extraction with mechanistic interpretability.
  • The research is published on arXiv with ID 2605.03058.
  • The paper is categorized under explainable AI (XAI).
  • The method uses contrastive hierarchical ablation for neuron localization.

Entities

Institutions

  • arXiv

Sources