HARNESS-LM: Training Framework for Efficient Sponsored Search Retrieval
HARNESS-LM (HLM) is a three-phase training framework designed to transfer the capabilities of large-scale retrievers into compact, cost-efficient models for sponsored search. The approach addresses the challenge of balancing retrieval quality with production latency. First, a high-performance reference (teacher) retriever is trained by fine-tuning a billion-parameter-scale Small Language Model (SLM). Second, query representations are aligned via an L2 objective to distill knowledge into a student encoder with under 600 million parameters. Third, a final contrastive refinement stage optimizes the student for retrieval performance. The paper includes a comprehensive empirical study of key design choices. Large retrieval models based on SLMs like Qwen3-Embedding-4B/8B set strong benchmarks but are impractical for high-throughput, latency-sensitive environments. HLM aims to make such capabilities deployable.
Key facts
- HARNESS-LM (HLM) is a three-phase training framework.
- It transfers capabilities of large-scale retrievers into compact models.
- Phase 1: train a teacher retriever by fine-tuning a billion-parameter SLM.
- Phase 2: align query representations via L2 objective to distill into a sub-600M parameter student.
- Phase 3: apply contrastive refinement to optimize student retrieval performance.
- Large SLM-based retrievers like Qwen3-Embedding-4B/8B set strong benchmarks but are impractical for production.
- The paper presents a comprehensive empirical study of key design choices.
- The framework addresses balancing retrieval quality with production latency.
Entities
—