ARTFEED — Contemporary Art Intelligence

Comparative Study Evaluates Explainability Techniques for Large Language Models

ai-technology · 2026-04-20

A comparative analysis of three explainability methods for large language models was conducted, focusing on their practical application rather than proposing new techniques. The study examined Integrated Gradients, Attention Rollout, and SHAP using a fine-tuned DistilBERT model for SST-2 sentiment classification tasks. Gradient-based attribution methods demonstrated superior stability and produced more intuitive explanations according to the findings. Attention-based approaches were found to be computationally efficient but less aligned with features relevant to predictions. Model-agnostic techniques offered flexibility but came with higher computational costs and greater variability in results. The research emphasized the importance of transparency in LLM decision processes for building trust, debugging, and real-world deployment. This work was documented in arXiv preprint 2604.15371v1 with a cross-announcement type. The study maintained a consistent and reproducible experimental setup throughout its evaluation.

Key facts

  • Study compares three explainability techniques for LLMs
  • Methods evaluated: Integrated Gradients, Attention Rollout, SHAP
  • Used fine-tuned DistilBERT model for SST-2 sentiment classification
  • Gradient-based attribution provided most stable and intuitive explanations
  • Attention-based methods were computationally efficient but less aligned with prediction features
  • Model-agnostic approaches offered flexibility with higher computational cost
  • Focus was on practical evaluation rather than proposing new methods
  • Research addresses transparency challenges for LLM trust and deployment

Entities

Sources