Caracal: Efficient LLM Architecture Using Fourier Transform

ai-technology · 2026-05-04

Caracal is a novel architecture for Large Language Models that replaces the quadratic-cost attention mechanism with a parameter-efficient Multi-Head Fourier (MHF) module, achieving O(L log L) complexity. It uses Fast Fourier Transform (FFT) for sequence mixing and introduces a frequency-domain causal masking technique via asymmetric padding and truncation to enable autoregressive generation. Unlike hardware-specific models like Mamba, Caracal relies on standard library operators, ensuring portability. Evaluations show competitive performance with existing models. The paper is available on arXiv.

Key facts

Caracal replaces attention with a Multi-Head Fourier (MHF) module.
Complexity is O(L log L) instead of quadratic.
Uses Fast Fourier Transform (FFT) for sequence mixing.
Applies frequency-domain causal masking via asymmetric padding and truncation.
Does not rely on hardware-specific implementations.
Uses standard library operators for portability.
Evaluations show competitive performance.
Paper available on arXiv (2605.00292).

Caracal: Efficient LLM Architecture Using Fourier Transform

Key facts

Entities

Institutions

Sources