Caracal: Efficient LLM Architecture Using Fourier Transform
Caracal is a novel architecture for Large Language Models that replaces the quadratic-cost attention mechanism with a parameter-efficient Multi-Head Fourier (MHF) module, achieving O(L log L) complexity. It uses Fast Fourier Transform (FFT) for sequence mixing and introduces a frequency-domain causal masking technique via asymmetric padding and truncation to enable autoregressive generation. Unlike hardware-specific models like Mamba, Caracal relies on standard library operators, ensuring portability. Evaluations show competitive performance with existing models. The paper is available on arXiv.
Key facts
- Caracal replaces attention with a Multi-Head Fourier (MHF) module.
- Complexity is O(L log L) instead of quadratic.
- Uses Fast Fourier Transform (FFT) for sequence mixing.
- Applies frequency-domain causal masking via asymmetric padding and truncation.
- Does not rely on hardware-specific implementations.
- Uses standard library operators for portability.
- Evaluations show competitive performance.
- Paper available on arXiv (2605.00292).
Entities
Institutions
- arXiv