Semantic Recall: New Metric for Vector Search Quality
A new metric called Semantic Recall has been introduced to evaluate approximate nearest neighbor search algorithms by focusing only on semantically relevant objects retrievable via exact search. Unlike traditional recall, it does not penalize algorithms for missing semantically irrelevant objects even if they are among nearest neighbors. The metric is especially useful for queries with few relevant results among nearest neighbors, a common scenario in embedding datasets. A proxy metric, Tolerant Recall, approximates Semantic Recall when relevant objects cannot be identified. Empirical results show these metrics are more effective indicators of retrieval quality.
Key facts
- Semantic Recall is a novel metric for assessing approximate nearest neighbor search algorithms.
- It considers only semantically relevant objects retrievable via exact nearest neighbor search.
- Unlike traditional recall, it does not penalize for failing to retrieve semantically irrelevant objects.
- The metric is useful for queries with few relevant results among nearest neighbors.
- This scenario is common within embedding datasets.
- Tolerant Recall is a proxy metric that approximates Semantic Recall when relevant objects cannot be identified.
- Empirical results show the new metrics are more effective indicators of retrieval quality.
- The paper is available on arXiv with ID 2604.20417.
Entities
Institutions
- arXiv