SPARK: Self-Play with Asymmetric Reward from Knowledge Graphs

other · 2026-05-09

A new framework called SPARK (Self-Play with Asymmetric Reward from Knowledge Graphs) aims to extend self-play reinforcement learning to scientific literature. Self-play has been successful in domains like mathematics and coding where problem generation and reward computation rely on explicit rules. However, scientific literature poses challenges because relationships among multi-modal elements across documents are rarely explicit, making automatic question generation and reliable reward signals difficult. SPARK addresses this by automatically constructing a unified knowledge graph from multi-document scientific literature. The knowledge graph serves as a structural basis for self-play: paths over multimodal nodes generate relational reasoning questions, and structured facts in the graph provide verifiable reward computation. The paper is published on arXiv with ID 2605.05546.

Key facts

SPARK stands for Self-Play with Asymmetric Reward from Knowledge Graphs.
It is a framework for self-play reinforcement learning in scientific literature.
Self-play has shown strong performance in mathematics and coding.
Scientific literature lacks explicit relationships among multi-modal elements.
SPARK automatically constructs a unified knowledge graph from multi-document scientific literature.
Knowledge graph paths generate relational reasoning questions.
Structured facts in the knowledge graph provide verifiable reward computation.
The paper is on arXiv with ID 2605.05546.

SPARK: Self-Play with Asymmetric Reward from Knowledge Graphs

Key facts

Entities

Institutions

Sources