HARNESS-LM: Training Framework for Efficient Sponsored Search Retrieval

other · 2026-05-25

HARNESS-LM (HLM) is a three-phase training framework designed to transfer the capabilities of large-scale retrievers into compact, cost-efficient models for sponsored search. The approach addresses the challenge of balancing retrieval quality with production latency. First, a high-performance reference (teacher) retriever is trained by fine-tuning a billion-parameter-scale Small Language Model (SLM). Second, query representations are aligned via an L2 objective to distill knowledge into a student encoder with under 600 million parameters. Third, a final contrastive refinement stage optimizes the student for retrieval performance. The paper includes a comprehensive empirical study of key design choices. Large retrieval models based on SLMs like Qwen3-Embedding-4B/8B set strong benchmarks but are impractical for high-throughput, latency-sensitive environments. HLM aims to make such capabilities deployable.

Key facts

HARNESS-LM (HLM) is a three-phase training framework.
It transfers capabilities of large-scale retrievers into compact models.
Phase 1: train a teacher retriever by fine-tuning a billion-parameter SLM.
Phase 2: align query representations via L2 objective to distill into a sub-600M parameter student.
Phase 3: apply contrastive refinement to optimize student retrieval performance.
Large SLM-based retrievers like Qwen3-Embedding-4B/8B set strong benchmarks but are impractical for production.
The paper presents a comprehensive empirical study of key design choices.
The framework addresses balancing retrieval quality with production latency.

Entities

—

Sources

arXiv cs.AI — 2026-05-25