Patent Retrieval Benchmark and Embedding Model: Sophia-Bench and QaECTER
To tackle the deficiency of varied benchmarks in patent search, a new benchmark named Sophia-bench and a 344M-parameter embedding model called QaECTER have been launched. Sophia-bench features 10,000 queries and 75,000 corpus documents collected over a decade, covering eight IPC technology sections and twelve filing jurisdictions. It evaluates retrieval effectiveness across 12 query types, including structured patent fields and AI-generated summaries, utilizing a citation-based ground truth supplemented by a domain-relevance metric known as InScope. QaECTER, which is trained on patent citations, aims to enhance the quality of embeddings. This initiative seeks to foster innovation, improve examination processes, and inform IP strategy decisions.
Key facts
- Sophia-bench contains 10,000 queries and 75,000 corpus documents.
- Benchmark spans ten years, eight IPC technology sections, and twelve filing jurisdictions.
- Tests retrieval using 12 different query types.
- Uses citation-based ground truth with InScope metric.
- QaECTER is a 344M-parameter embedding model.
- Model trained on patent citations.
- Addresses lack of diverse benchmarks in patent retrieval.
- Aims to improve innovation, examination, and IP strategy.
Entities
—