ARTFEED — Contemporary Art Intelligence

Symmetry Transfer in LLMs via Layer-Peeled Optimization

other · 2026-05-14

A new study analyzes whether pretraining large language models by minimizing cross-entropy loss for next-token prediction induces geometric structure in learned weights and context embeddings. Using a constrained layer-peeled optimization program as a tractable surrogate, the authors prove that symmetries in target next-token distributions transfer to global minimizers in a group-theoretic sense. Specifically, when target tokens exhibit cyclic-shift symmetry (e.g., days of the week, months of the year), the optimal logit matrix becomes exactly circulant, and Gram matrices of context embeddings reflect the same symmetry. The work provides mathematical foundations for understanding how optimization shapes representations in LLMs.

Key facts

  • arXiv:2605.12756v1
  • Study uses layer-peeled optimization as surrogate for LLMs
  • Focus on cross-entropy loss for next-token prediction
  • Proves symmetry transfer in group-theoretic sense
  • Cyclic-shift symmetry leads to circulant logit matrix
  • Examples: seven days of week, twelve months of year
  • Gram matrices of context embeddings also reflect symmetry
  • Nonconvex optimization program analyzed

Entities

Institutions

  • arXiv

Sources