Logical Characterization of Encoder-Decoder Transformers
A novel logical characterization of encoder-decoder transformers, foundational for LLMs and cross-attention applications, has been proposed. The study models these transformers over text with floating-point numbers and soft-attention, using a new temporal logic that extends propositional logic with a counting global modality over encoder input and a past modality over decoder input. An additional characterization via distributed automata is provided, and results are shown to accommodate architectural variations like masking. The autoregressive setting is also discussed.
Key facts
- Novel logical characterization of encoder-decoder transformers
- Foundational architecture for LLMs and cross-attention settings
- Study over text with floating-point numbers and soft-attention
- New temporal logic extends propositional logic
- Counting global modality over encoder input
- Past modality over decoder input
- Additional characterization via distributed automata
- Results account for changes in masking and other architectural variations
- Discussion of autoregressive setting
Entities
Institutions
- arXiv