Theoretical Analysis of LLM Reasoning via Optimal Transport

other · 2026-05-20

A recent study published on arXiv (2605.19944) introduces a formal approach to reasoning within large language models through optimal transport. This method involves projecting discrete paths into a continuous metric space to measure domain shifts using the Wasserstein-1 distance. Findings indicate that attention mechanisms reliant on position, such as Absolute Positional Encoding, do not maintain shift invariance, resulting in an Ω(1) Lipschitz constant and anticipated risk. In contrast, shift-invariant methods like Rotary Embeddings successfully maintain equivariance and limit error. Furthermore, the authors correlate sequential backtracking with a Dyck-k language, establishing a definitive lower bound on circuit depth for TC⁰ Transformers.

Key facts

Paper arXiv:2605.19944 analyzes LLM reasoning via optimal transport.
Uses Wasserstein-1 distance to quantify domain shifts.
Position-dependent attention (e.g., Absolute Positional Encoding) yields Ω(1) Lipschitz constant.
Shift-invariant mechanisms (e.g., Rotary Embeddings) preserve equivariance and bound error.
Sequential backtracking mapped to Dyck-k language establishes circuit depth lower bound for TC⁰ Transformers.

Theoretical Analysis of LLM Reasoning via Optimal Transport

Key facts

Entities

Institutions

Sources