Patent Embedding Models Benchmark: 22 Models Tested Across Retrieval, Classification, and Clustering

other · 2026-05-26

A new study has assessed 22 different patent embedding models, which include encoders with as few as 22 million parameters to massive instruction-tuned large language models (LLMs) with up to 12 billion parameters. It leverages a collection of 113,148 WIPO assistive-technology patents and 46,069 citation-graph retrieval queries, in addition to the DAPFAM dataset. The evaluation includes citation-based retrieval, a mix of sparse and dense fusion, multi-label classification across various datasets, unsupervised clustering, and analysis from DWPI experts. Results show that fine-tuning approaches differ based on the task; for instance, while single-landscape tuning can boost performance within a domain, it may hinder retrieval in different contexts, raising questions about the benefits of more domain data.

Key facts

22 embedding models benchmarked
Models range from 22M-parameter encoders to 12B instruction-tuned LLMs
Tasks: retrieval, classification, clustering
113,148 WIPO assistive-technology patents used
46,069 citation-graph retrieval queries
Public DAPFAM dataset for external validation
Framework includes citation-based retrieval, hybrid sparse-dense fusion, multi-label classification, unsupervised clustering, six text-section views, domain-adaptive fine-tuning, jurisdiction analysis, proprietary DWPI content
Fine-tuning task-dependent: single-landscape tuning can improve in-domain scores but hurt external retrieval

Patent Embedding Models Benchmark: 22 Models Tested Across Retrieval, Classification, and Clustering

Key facts

Entities

Institutions

Sources