Semantic Correlation Descriptors Identify Training Datasets

ai-technology · 2026-06-01

A new white-box fingerprinting method called semantic correlation descriptors (SCDs) can identify which dataset a model was trained on by analyzing the spurious correlations it internalizes. Researchers argue that datasets leave unique traces in a model's learned semantic correlation structure—incidental regularities predictive within a dataset but not causal for the underlying task. This approach moves beyond existing dataset-level membership inference methods that rely on confidence scores, losses, margins, generated samples, or query responses. In controlled leave-one-dataset-out diagnostics, SCDs perfectly separate matching from non-matching dataset pairs.

Key facts

SCDs capture the semantic correlation structure learned by a model.
Method identifies dataset-specific traces from spurious correlations.
Surpasses existing behavioral or distributional evidence approaches.
Perfect separation achieved in leave-one-dataset-out diagnostics.
White-box approach requires model access.
Published on arXiv with ID 2605.30462.
Focuses on dataset-level membership inference.
Traces are incidental regularities, not causal task features.

Semantic Correlation Descriptors Identify Training Datasets

Key facts

Entities

Institutions

Sources