InteractBind: Large-Scale Dataset for Protein-Ligand Binding Site Localization
Researchers introduced InteractBind, a large-scale dataset of approximately 100,000 protein-ligand pairs, designed to evaluate whether models can localize binding sites and identify non-covalent interactions. Existing benchmarks focus on binary binding prediction and affinity regression, but do not assess spatial understanding of molecular recognition. InteractBind includes fine-grained binding-site localization tasks using protein-residue and ligand-atom interaction maps covering six major non-covalent interaction types. The dataset also provides binding affinity data. This work addresses a critical gap in computational drug discovery and molecular design by enabling more rigorous evaluation of protein-ligand models.
Key facts
- InteractBind comprises approximately 100,000 protein-ligand pairs.
- The dataset includes binding-site localization as a core fine-grained task.
- Interaction maps cover six major types of non-covalent interactions.
- Existing benchmarks evaluate only binary binding and affinity regression.
- The work aims to assess whether models learn binding sites or just binding likelihood.
- InteractBind is introduced as a benchmark for fine-grained evaluation.
- The dataset supports computational drug discovery and molecular design.
- The paper is published on arXiv with ID 2605.24045.
Entities
Institutions
- arXiv