Nautile-370M: Hybrid Reasoning Model with Spectral Memory

ai-technology · 2026-04-30

A new small language model named Nautile-370M has been unveiled, featuring 371 million parameters and designed for effective reasoning within limited parameter and inference constraints. This model utilizes a hybrid architecture that alternates between two SeqCond Attention (SCA) layers—drawing from the linear-time spectral sequence operator based on SeqCondenser—and one transformer layer. The objective is to merge the advantages of structured sequential models, such as long-context efficiency and state-tracking, with the flexible token-to-token routing provided by attention mechanisms. Training took place on a single Cloud TPU v4-64 pod slice through the Google TPU Research Cloud (TRC) program, followed by reinforcement learning on a single NVIDIA DGX Spark. The authors demonstrate that the SCA readout can precisely extract any token from the prefix summary and replicate softmax attention outputs, confirming SCA's expressiveness compared to standard attention.

Key facts

Nautile-370M is a 371-million-parameter small language model.
It uses a hybrid backbone with two SeqCond Attention layers and one transformer layer.
SeqCond Attention is a linear-time spectral sequence operator inspired by SeqCondenser.
The model was trained on a single Cloud TPU v4-64 pod slice via Google TPU Research Cloud.
Reinforcement learning was performed on a single NVIDIA DGX Spark.
SCA readout can exactly retrieve any individual token from the prefix summary.
SCA can reproduce any output of softmax attention as a special case.
The paper is available on arXiv with ID 2604.24809.

Nautile-370M: Hybrid Reasoning Model with Spectral Memory

Key facts

Entities

Institutions

Sources