RegimeRouter: A Lightweight Binary Router for Two-Hop QA
A novel theoretical model for two-hop question answering (QA) retrieval categorizes queries into two types: Q-dominant, where the hop-2 entity is directly mentioned in the question, and B-dominant, where it is found solely in the bridge passage. This distinction is supported by three theorems: (T1) the AUC for each query is a monotonic function of the cosine separation margin (R² ≥ 0.90 for six out of eight type-encoder pairs); (T2) the regime is defined by two surface-text predicates (P1 is crucial for routing, P2 identifies B-dominant) across three datasets and encoders; (T3) the bridge advantage depends on the relation-bearing sentence, not just the entity name, with its absence leading to a performance decline of 8.6–14.1 percentage points (p < 0.001). To leverage this theory, the authors introduce RegimeRouter, a simple binary router that chooses between question-only and question-plus-relation-sentence retrieval, utilizing five text features based on predicate definitions. The router is trained on 2Wiki.
Key facts
- Two-hop QA retrieval splits into Q-dominant and B-dominant regimes.
- Three theorems formalize the regime split with high statistical significance.
- Per-query AUC is a monotone function of cosine separation margin (R² ≥ 0.90 for six of eight type-encoder pairs).
- Regime is characterized by two surface-text predicates (P1 and P2).
- Bridge advantage requires the relation-bearing sentence, not entity name alone.
- Removing the relation-bearing sentence causes an 8.6–14.1 pp performance drop (p < 0.001).
- RegimeRouter is a lightweight binary router using five text features.
- RegimeRouter selects between question-only and question-plus-relation-sentence retrieval.
Entities
Institutions
- arXiv