AD-Copilot: A Vision-Language Model for Industrial Anomaly Detection via Visual In-Context Comparison
AD-Copilot, an interactive Multimodal Large Language Model (MLLM), addresses underperformance in industrial anomaly detection (IAD) by leveraging visual in-context comparison. Traditional MLLMs, trained on general web data, struggle with industrial images due to significant domain differences and independent image encoding, which limits sensitivity to subtle visual variations crucial for IAD. To overcome this, a novel data curation pipeline mines inspection knowledge from sparsely labeled industrial images, generating precise samples for captioning, visual question answering (VQA), and defect localization. This process yields Chat-AD, a large-scale multimodal dataset rich in semantic signals for IAD. AD-Copilot incorporates a Comparison Encoder that employs cross-attention mechanisms to compare images directly in visual space, enhancing detection accuracy. The model is detailed in arXiv preprint arXiv:2603.13779v2, announced as a replace-cross type, focusing on specialized applications in industrial settings. It aims to improve anomaly detection by integrating visual context, moving beyond language-space comparisons that hinder performance in technical domains. The approach highlights advancements in AI-driven industrial inspection, targeting efficiency and precision in manufacturing and quality control processes.
Key facts
- AD-Copilot is an interactive Multimodal Large Language Model (MLLM) for industrial anomaly detection (IAD).
- Traditional MLLMs underperform in IAD due to training on general web data and independent image encoding.
- A novel data curation pipeline creates Chat-AD, a large-scale multimodal dataset from sparsely labeled industrial images.
- Chat-AD includes samples for captioning, visual question answering (VQA), and defect localization.
- AD-Copilot uses a Comparison Encoder with cross-attention for visual in-context comparison.
- The model addresses sensitivity to subtle visual differences key to IAD.
- Details are in arXiv preprint arXiv:2603.13779v2, announced as replace-cross type.
- The focus is on improving anomaly detection in industrial settings like manufacturing.
Entities
—