Neural Networks' OOD Generalization Depends on Feature Engineering Bias
A recent paper on arXiv (2605.07483) explores the reasons behind the inability of deep neural networks to extrapolate out-of-distribution (OOD) despite effectively learning key features from in-distribution (ID) training datasets. The researchers demonstrate that OOD extrapolation is non-identifiable from a single training window, as numerous data-generating processes (DGPs) can appear equivalent based on training data but differ outside of it. No single in-distribution criterion can effectively resolve this ambiguity. Instead, the assumed DGP and OOD generalization are influenced by the structural commitment of the feature map, label map, and model class, without affecting ID performance. Success is achieved when architecture, pretraining, augmentation, input formats, or domain knowledge provide the necessary commitment. The study separates feature learning from DGP identifiability, showing that feature engineering serves as an identifiability bias for OOD generalization.
Key facts
- arXiv paper 2605.07483
- Deep neural networks fail to learn OOD-relevant representations from ID training
- OOD extrapolation is non-identifiable from a single training window
- Infinitely many DGPs are observationally equivalent on training data but diverge outside
- No in-distribution criterion alone breaks the tie
- Structural commitment (feature map, label map, model class) governs OOD generalization
- Success requires implicit injection of missing commitment via architecture, pretraining, etc.
- Feature engineering acts as identifiability bias for OOD generalization
Entities
Institutions
- arXiv