TriEx: A Game-Based Framework for Explaining Multi-Agent LLM Reasoning
The paper presents TriEx, a tri-view framework designed for explainability in multi-agent LLMs operating in interactive and partially observable environments. It enhances sequential decision-making through three coordinated elements: self-reasoning from a first-person perspective tied to actions, evolving second-person belief states regarding opponents, and third-person oracle audits based on reference signals from the environment. This approach transforms explanations from unstructured narratives into evidence-based objects that can be compared across different times and viewpoints. Utilizing imperfect-information strategic games as a testing ground, TriEx facilitates a comprehensive analysis of explanation fidelity, belief evolution, and evaluator consistency, highlighting consistent discrepancies between agents' statements and their actions.
Key facts
- TriEx is a tri-view explainability framework for multi-agent LLMs.
- It instruments sequential decision making with three aligned artifacts.
- First-person self-reasoning is bound to an action.
- Second-person belief states about opponents are updated over time.
- Third-person oracle audits are grounded in environment-derived reference signals.
- Explanations become evidence-anchored objects comparable across time and perspectives.
- Imperfect-information strategic games are used as a controlled testbed.
- The framework reveals systematic mismatches between what agents say and what they do.
Entities
—