PWRules Framework Applies Protein Words to Predict Small Molecule Binding with Interpretability
The PWRules framework improves the interpretability of protein-small molecule binding predictions by pinpointing favored small molecule fragments and establishing pairing rules with protein words—semantic sequence units. By utilizing binding affinity data, the framework ranks word-fragment rules via the PWScore function to highlight active compounds. Assessments on benchmark datasets reveal that PWScore performs competitively, on par with the physics-based model Glide and the deep learning model PSICHIC. This framework demonstrates wide applicability for protein targets beyond the training dataset, such as the SARS-CoV-2 main protease. Importantly, PWScore captures complementary interaction data, mitigating the dependence on opaque deep learning models in drug discovery while integrating principles and heuristics of protein-ligand interactions. This research was shared on arXiv under identifier 2604.16550v1, emphasizing the enhancement of binding prediction interpretability through this innovative method.
Key facts
- PWRules framework improves interpretability of protein-small molecule binding predictions
- Identifies privileged small molecule fragments using binding affinity data
- Defines complementary pairing rules between fragments and protein words (semantic sequence units)
- PWScore function ranks word-fragment rules to prioritize active compounds
- Achieves competitive performance comparable to physics-based model Glide and deep learning model PSICHIC
- Shows broad applicability for protein targets outside training dataset, e.g., SARS-CoV-2 main protease
- Captures complementary interaction information
- Research announced on arXiv with identifier 2604.16550v1 as cross announcement
Entities
Institutions
- arXiv