EvoTD: Evolutionary Task Discovery for LLM Reasoning
A novel framework known as Evolutionary Task Discovery (EvoTD) tackles the shortcomings of existing post-training approaches for Large Language Models (LLMs), including Reinforcement Learning from Verifiable Rewards (RLVR). The primary challenge lies in the limitations of training data diversity and complexity, which hinder advancements in reasoning. Current data synthesis techniques frequently experience homogeneity collapse due to unstructured mutation or exploration. EvoTD approaches data synthesis as a targeted search across a dual-axis manifold of Algorithmic Skills and Complexity Attributes. It features structured evolutionary operators: a Crossover operator that creates new skill combinations to boost diversity, and a Parametric Mutation operator that adjusts structural constraints. This framework aspires to methodically broaden the reasoning capabilities of LLMs. The paper can be found on arXiv with the identifier 2605.11666.
Key facts
- EvoTD is a framework for data synthesis to improve LLM reasoning.
- It addresses homogeneity collapse in existing synthesis methods.
- The framework uses a dual-axis manifold of Algorithmic Skills and Complexity Attributes.
- Crossover operator synthesizes novel skill compositions.
- Parametric Mutation operator scales structural constraints.
- The paper is on arXiv with ID 2605.11666.
- Current post-training paradigms include RLVR.
- The goal is to systematically expand the reasoning frontier.
Entities
Institutions
- arXiv