Patent Embedding Models Benchmark: 22 Models Tested Across Retrieval, Classification, and Clustering
A new study has assessed 22 different patent embedding models, which include encoders with as few as 22 million parameters to massive instruction-tuned large language models (LLMs) with up to 12 billion parameters. It leverages a collection of 113,148 WIPO assistive-technology patents and 46,069 citation-graph retrieval queries, in addition to the DAPFAM dataset. The evaluation includes citation-based retrieval, a mix of sparse and dense fusion, multi-label classification across various datasets, unsupervised clustering, and analysis from DWPI experts. Results show that fine-tuning approaches differ based on the task; for instance, while single-landscape tuning can boost performance within a domain, it may hinder retrieval in different contexts, raising questions about the benefits of more domain data.
Key facts
- 22 embedding models benchmarked
- Models range from 22M-parameter encoders to 12B instruction-tuned LLMs
- Tasks: retrieval, classification, clustering
- 113,148 WIPO assistive-technology patents used
- 46,069 citation-graph retrieval queries
- Public DAPFAM dataset for external validation
- Framework includes citation-based retrieval, hybrid sparse-dense fusion, multi-label classification, unsupervised clustering, six text-section views, domain-adaptive fine-tuning, jurisdiction analysis, proprietary DWPI content
- Fine-tuning task-dependent: single-landscape tuning can improve in-domain scores but hurt external retrieval
Entities
Institutions
- WIPO
- Clarivate