Neural Networks' OOD Generalization Depends on Feature Engineering Bias

other · 2026-05-11

A recent paper on arXiv (2605.07483) explores the reasons behind the inability of deep neural networks to extrapolate out-of-distribution (OOD) despite effectively learning key features from in-distribution (ID) training datasets. The researchers demonstrate that OOD extrapolation is non-identifiable from a single training window, as numerous data-generating processes (DGPs) can appear equivalent based on training data but differ outside of it. No single in-distribution criterion can effectively resolve this ambiguity. Instead, the assumed DGP and OOD generalization are influenced by the structural commitment of the feature map, label map, and model class, without affecting ID performance. Success is achieved when architecture, pretraining, augmentation, input formats, or domain knowledge provide the necessary commitment. The study separates feature learning from DGP identifiability, showing that feature engineering serves as an identifiability bias for OOD generalization.

Key facts

arXiv paper 2605.07483
Deep neural networks fail to learn OOD-relevant representations from ID training
OOD extrapolation is non-identifiable from a single training window
Infinitely many DGPs are observationally equivalent on training data but diverge outside
No in-distribution criterion alone breaks the tie
Structural commitment (feature map, label map, model class) governs OOD generalization
Success requires implicit injection of missing commitment via architecture, pretraining, etc.
Feature engineering acts as identifiability bias for OOD generalization

Neural Networks' OOD Generalization Depends on Feature Engineering Bias

Key facts

Entities

Institutions

Sources