M2A Merges Mathematical and Agentic Reasoning in LLMs
A new paradigm called M2A synergizes mathematical and agentic reasoning in large language models through model merging. Mathematical reasoning relies on intrinsic logic for closed-world problems in a single response, while agentic reasoning requires multi-turn interaction with external environments. Their misalignment prevents mutual benefit and causes unstable behavior under multi-task learning. M2A operates in parameter space, identifying feature subspaces critical for agent behavior to avoid overfitting. The approach merges models to combine both reasoning types effectively.
Key facts
- M2A is a novel paradigm for synergizing mathematical and agentic reasoning.
- Mathematical reasoning uses intrinsic logic for closed-world problems in a single response.
- Agentic reasoning requires multi-turn interaction with external environments.
- The misalignment between the two reasoning types prevents effective mutual benefit.
- Multi-task learning yields unstable reasoning behavior and limited performance gains.
- M2A operates directly in parameter space.
- It identifies the feature subspace critical for agent behavior.
- Model merging avoids overfitting to superficial reasoning patterns.
Entities
Institutions
- arXiv