Geospatial Web Search Far Exceeds Traditional GIS Labels
A recent analysis of 1.01 million Bing searches from the MS MARCO dataset has uncovered that 18.0% (181,827 queries) are geospatial, which is nearly three times greater than the 6.17% previously classified as Location. Researchers utilized dense sentence embeddings, a SetFit classifier, and density-based clustering to categorize the queries into 88 distinct groups. The majority of these queries are transactional and practical in nature, with costs and prices representing 15.3% of geospatial inquiries—almost double that of the entire physical geography category. A significant portion of this data, including costs, opening hours, contact information, weather, and travel suggestions, lies beyond the conventional GIS framework.
Key facts
- 181,827 of 1.01 million Bing queries are geospatial (18.0%)
- Original annotations labeled only 6.17% as Location
- 88 query categories identified via clustering
- Costs and prices make up 15.3% of geospatial queries
- Physical geography theme is half the size of cost queries
- Dense sentence embeddings and SetFit classifier used
- MS MARCO corpus analyzed without prior filtering
- Study highlights gap between geospatial web search and traditional GIS
Entities
Institutions
- Microsoft Bing
- MS MARCO