Differentiable Graph Partitioning Interprets Protein Language Models

other · 2026-05-13

Researchers propose SoftBlobGIN, a framework that projects ESM-2 protein language model representations onto contact graphs for interpretable structural analysis. The method uses a Graph Isomorphism Network with differentiable Gumbel-softmax pooling to learn functional substructures. On enzyme classification tasks, it achieves 92.8% accuracy and 0.898 macro-F1. Unlike post hoc analysis, SoftBlobGIN produces directly auditable explanations, with GNNExplainer recovering biologically meaningful active-site residues and catalytic clusters.

Key facts

SoftBlobGIN is a plug-and-play framework for ESM-2 representations.
It projects representations onto protein contact graphs.
Uses a Graph Isomorphism Network with differentiable Gumbel-softmax pooling.
Achieves 92.8% accuracy and 0.898 macro-F1 on enzyme classification.
Produces directly auditable structural explanations.
GNNExplainer recovers active-site residues and functional clusters.
Framework is structure-aware and learns coarse functional substructures.
Addresses interpretability of dense latent spaces in protein language models.

Entities

—

Sources

arXiv cs.AI — 2026-05-13