Adaptive Test-Time Compute Allocation with Evolving Demonstrations
A new framework jointly adapts compute allocation and generation distribution at test time. It uses a warm-up phase to identify easy queries and build an initial pool of question-response pairs from the test set. An adaptive phase then focuses computation on unresolved queries, reshaping generation distributions via evolving in-context demonstrations that condition on successful responses from semantically related queries. Experiments on math, coding, and reasoning benchmarks show consistent outperformance over baselines with substantially less compute.
Key facts
- Framework jointly adapts compute allocation and generation distribution at test time.
- Warm-up phase identifies easy queries and assembles initial pool of question-response pairs from the test set.
- Adaptive phase concentrates computation on unresolved queries.
- Generation distributions reshaped through evolving in-context demonstrations.
- Conditioning on successful responses from semantically related queries.
- Experiments on math, coding, and reasoning benchmarks.
- Outperforms existing baselines while consuming substantially less compute.
Entities
—