TSFMAudit: Detecting Data Contamination in Time Series Foundation Models
TSFMAudit, an innovative technique, identifies contamination in pretraining data for time series foundation models (TSFMs). This method assesses contamination by analyzing adaptation efficiency during the fine-tuning process; datasets that are contaminated exhibit a quicker decrease in loss with minimal movement of the backbone. This approach has been tested on 6 TSFMs and 187 datasets, utilizing verified training source evidence. TSFMAudit represents the inaugural effort to tackle the auditing of pretraining contamination specifically for TSFMs.
Key facts
- TSFMAudit is the first method for pretraining contamination auditing in TSFMs.
- It uses probe adaptation dynamics to detect contamination.
- Contaminated datasets exhibit faster loss reduction and smaller backbone movement during fine-tuning.
- Evaluated on 6 TSFMs and 187 datasets.
- Evaluation uses documented training source evidence as supervision.
- Time series signals are continuous and heterogeneous, complicating auditing.
- The work addresses concerns about overly optimistic performance estimates due to contamination.
- The method is formalized as a probe-based auditing approach.
Entities
Institutions
- arXiv