Logit Shift as Proxy for Continual Learning Model Selection

other · 2026-05-28

A new theoretical framework proposes using logit shift as a lightweight selector for capturing trends in continual learning (CL) model selection. The approach decouples logit shift into architecture dependency and data dependency, addressing the computational cost of obtaining logit shift in deep pre-trained neural networks. Existing analyses assume uniform hidden layer widths, ignoring structural heterogeneity of real-world architectures. The study establishes a theoretical relationship between heterogeneous architecture and logit shift on prior tasks, enabling efficient model selection without full logit shift computation.

Key facts

Continual Learning (CL) is a practical paradigm for deep pre-trained neural networks.
Logit shift serves as a natural proxy for model selection in CL scenarios.
Obtaining logit shift requires huge computational cost.
Existing theoretical analyses assume uniform hidden layer widths.
Real-world architectures have variable width and depth (structural heterogeneity).
The study decouples logit shift into architecture dependency and data dependency.
The framework aims to establish a theoretical relationship between architecture and logit shift.
The approach is described as a lightweight selector for capturing logit shift trends.

Entities

—

Sources

arXiv cs.AI — 2026-05-28