Differentiable Graph Partitioning Interprets Protein Language Models
Researchers propose SoftBlobGIN, a framework that projects ESM-2 protein language model representations onto contact graphs for interpretable structural analysis. The method uses a Graph Isomorphism Network with differentiable Gumbel-softmax pooling to learn functional substructures. On enzyme classification tasks, it achieves 92.8% accuracy and 0.898 macro-F1. Unlike post hoc analysis, SoftBlobGIN produces directly auditable explanations, with GNNExplainer recovering biologically meaningful active-site residues and catalytic clusters.
Key facts
- SoftBlobGIN is a plug-and-play framework for ESM-2 representations.
- It projects representations onto protein contact graphs.
- Uses a Graph Isomorphism Network with differentiable Gumbel-softmax pooling.
- Achieves 92.8% accuracy and 0.898 macro-F1 on enzyme classification.
- Produces directly auditable structural explanations.
- GNNExplainer recovers active-site residues and functional clusters.
- Framework is structure-aware and learns coarse functional substructures.
- Addresses interpretability of dense latent spaces in protein language models.
Entities
—