Semantic Correlation Descriptors Identify Training Datasets
A new white-box fingerprinting method called semantic correlation descriptors (SCDs) can identify which dataset a model was trained on by analyzing the spurious correlations it internalizes. Researchers argue that datasets leave unique traces in a model's learned semantic correlation structure—incidental regularities predictive within a dataset but not causal for the underlying task. This approach moves beyond existing dataset-level membership inference methods that rely on confidence scores, losses, margins, generated samples, or query responses. In controlled leave-one-dataset-out diagnostics, SCDs perfectly separate matching from non-matching dataset pairs.
Key facts
- SCDs capture the semantic correlation structure learned by a model.
- Method identifies dataset-specific traces from spurious correlations.
- Surpasses existing behavioral or distributional evidence approaches.
- Perfect separation achieved in leave-one-dataset-out diagnostics.
- White-box approach requires model access.
- Published on arXiv with ID 2605.30462.
- Focuses on dataset-level membership inference.
- Traces are incidental regularities, not causal task features.
Entities
Institutions
- arXiv