Research Reveals Dense Retrieval's Compositional Sensitivity Limits
A recent study published on arXiv (ID: 2604.16351v1) investigates the shortcomings of dense retrieval systems when addressing compositional text modifications. These systems convert texts into vector embeddings and utilize cosine similarity for ranking, which works well for recall but struggles with identity matching. Inspired by the work of Kang et al. (2025), the findings indicate that even slight compositional adjustments can greatly change meaning while still achieving high similarity scores. The experiments demonstrate that incorporating structure-targeted negative examples during training negatively impacts zero-shot performance on the NanoBEIR benchmark, leading to an 8-9% reduction in mean nDCG@10 for smaller models and as much as 40% for medium models. Additionally, the study assesses verification methods, revealing that while MaxSim is effective for reranking, it falls short in detecting structural near-misses, whereas a small Transformer model performs well in this area.
Key facts
- Dense retrieval compresses texts into single embeddings ranked by cosine similarity.
- Minimal compositional edits like negation or role swaps can flip meaning while retaining high similarity.
- The study tests retrieval-composition tension across four dual-encoder backbones.
- Adding structure-targeted negatives reduces zero-shot NanoBEIR retrieval performance.
- Mean nDCG@10 drops 8-9% on small backbones and up to 40% on medium ones.
- Pooled-space separation only partially improves with targeted training.
- MaxSim excels at reranking but fails to reject structural near-misses.
- A small Transformer over similarity maps reliably separates near-misses end-to-end.
Entities
Artists
- Kang et al.
Institutions
- arXiv