Fourier Analysis Reveals Generalization in Transformers

ai-technology · 2026-05-22

A recent study explores the generalization of transformers within boolean domains by examining the Fourier spectra of target functions. In contrast to earlier research that utilized Rademacher complexity, this study employs PAC-Bayes theory to establish generalization bounds. The findings reveal that sparse spectra focused on low-degree components facilitate low-sharpness constructions that exhibit strong generalization. This construction showcases flat minima capable of implementing any boolean function with a sparsity not exceeding the context length. Additionally, a PAC-Bayes bound applied to an idealized low-sharpness learner results in a meaningful generalization bound. Empirical assessments and mechanistic interpretability validate the practical relevance of the theoretical construction in actual transformers.

Key facts

Study focuses on transformers' generalization behavior on boolean domains.
Uses Fourier spectra of target functions.
Contrasts with prior work by Edelman et al. (2022) and Trauger and Tewari (2024).
Derives generalization bounds via PAC-Bayes theory.
Sparse spectra on low-degree components enable low-sharpness constructions.
Flat minima can implement any boolean function with sparsity ≤ context length.
PAC-Bayes bound yields non-vacuous generalization bound.
Empirical and mechanistic interpretability support the construction.

Fourier Analysis Reveals Generalization in Transformers

Key facts

Entities

Institutions

Sources