ReTabAD Benchmark Restores Semantic Context for Tabular Anomaly Detection
ReTabAD, a newly established benchmark, fills a significant void in tabular anomaly detection (AD) by reintroducing textual semantics frequently absent in current datasets. This benchmark features 20 carefully selected tabular datasets, enhanced with structured textual metadata, including feature descriptions and domain insights, crucial for context-aware AD. It also incorporates implementations of cutting-edge AD algorithms, covering classical methods, deep learning techniques, and LLM-based strategies. Furthermore, ReTabAD presents a zero-shot LLM framework that utilizes semantic context without the need for task-specific training, setting a robust foundation for future investigations. The findings are elaborated in a paper available on arXiv (2510.02060).
Key facts
- ReTabAD is a benchmark for context-aware tabular anomaly detection.
- It provides 20 curated tabular datasets with structured textual metadata.
- Includes implementations of classical, deep learning, and LLM-based AD algorithms.
- Introduces a zero-shot LLM framework that uses semantic context without training.
- The paper is available on arXiv with ID 2510.02060.
- Existing benchmarks lack semantic context like feature descriptions and domain knowledge.
- ReTabAD aims to enable models to leverage domain knowledge for detection.
- The benchmark is designed to restore textual semantics for AD research.
Entities
Institutions
- arXiv