C-MORAL: RL Framework for Multi-Objective Molecular Optimization with LLMs
A new framework called C-MORAL has been introduced by researchers, designed for controllable multi-objective molecular optimization through reinforcement learning and large language models. This framework integrates group-based relative optimization, aligns property scores for diverse objectives, and utilizes continuous non-linear reward aggregation to enhance stability amid various drug-design constraints. In evaluations on the C-MuMOInstruct benchmark, C-MORAL attained a leading Success Optimized Rate (SOR) of 48.9% for in-domain tasks and 39.5% for out-of-domain tasks, all while maintaining scaffold similarity. This research illustrates the effectiveness of RL post-training in aligning molecular language models with ongoing molecular design goals. The associated code and data are accessible.
Key facts
- C-MORAL is a reinforcement learning post-training framework for controllable multi-objective molecular optimization.
- It combines group-based relative optimization, property score alignment, and continuous non-linear reward aggregation.
- Evaluated on the C-MuMOInstruct benchmark.
- Achieves SOR of 48.9% on IND tasks and 39.5% on OOD tasks.
- Preserves scaffold similarity while optimizing multiple properties.
- Addresses alignment of LLMs with selective and competing drug-design constraints.
- Published as arXiv:2604.23061.
- Code and data are released.
Entities
Institutions
- arXiv