Theoretical Analysis of LLM Reasoning via Optimal Transport
A recent study published on arXiv (2605.19944) introduces a formal approach to reasoning within large language models through optimal transport. This method involves projecting discrete paths into a continuous metric space to measure domain shifts using the Wasserstein-1 distance. Findings indicate that attention mechanisms reliant on position, such as Absolute Positional Encoding, do not maintain shift invariance, resulting in an Ω(1) Lipschitz constant and anticipated risk. In contrast, shift-invariant methods like Rotary Embeddings successfully maintain equivariance and limit error. Furthermore, the authors correlate sequential backtracking with a Dyck-k language, establishing a definitive lower bound on circuit depth for TC⁰ Transformers.
Key facts
- Paper arXiv:2605.19944 analyzes LLM reasoning via optimal transport.
- Uses Wasserstein-1 distance to quantify domain shifts.
- Position-dependent attention (e.g., Absolute Positional Encoding) yields Ω(1) Lipschitz constant.
- Shift-invariant mechanisms (e.g., Rotary Embeddings) preserve equivariance and bound error.
- Sequential backtracking mapped to Dyck-k language establishes circuit depth lower bound for TC⁰ Transformers.
Entities
Institutions
- arXiv