SPARK: Self-Play with Asymmetric Reward from Knowledge Graphs
A new framework called SPARK (Self-Play with Asymmetric Reward from Knowledge Graphs) aims to extend self-play reinforcement learning to scientific literature. Self-play has been successful in domains like mathematics and coding where problem generation and reward computation rely on explicit rules. However, scientific literature poses challenges because relationships among multi-modal elements across documents are rarely explicit, making automatic question generation and reliable reward signals difficult. SPARK addresses this by automatically constructing a unified knowledge graph from multi-document scientific literature. The knowledge graph serves as a structural basis for self-play: paths over multimodal nodes generate relational reasoning questions, and structured facts in the graph provide verifiable reward computation. The paper is published on arXiv with ID 2605.05546.
Key facts
- SPARK stands for Self-Play with Asymmetric Reward from Knowledge Graphs.
- It is a framework for self-play reinforcement learning in scientific literature.
- Self-play has shown strong performance in mathematics and coding.
- Scientific literature lacks explicit relationships among multi-modal elements.
- SPARK automatically constructs a unified knowledge graph from multi-document scientific literature.
- Knowledge graph paths generate relational reasoning questions.
- Structured facts in the knowledge graph provide verifiable reward computation.
- The paper is on arXiv with ID 2605.05546.
Entities
Institutions
- arXiv