ARTFEED — Contemporary Art Intelligence

Nexusformer Introduces Nonlinear Attention for Scalable Transformer Architecture

ai-technology · 2026-04-22

A novel transformer architecture named Nexusformer has been introduced to tackle the scaling challenges faced by conventional models. Standard transformers necessitate the training of larger variants from the ground up due to their attention mechanisms relying on linear projections, which restrict feature extraction to fixed-dimensional subspaces. This limitation hampers both expressivity and the ability to incrementally expand capacity. Nexusformer innovatively substitutes these linear Q/K/V projections with a Nexus-Rank layer, utilizing a three-stage nonlinear mapping activated by dual activations in increasingly higher dimensional spaces. This approach eliminates linearity restrictions and facilitates lossless structured growth. New capacity can be added along two axes through zero-initialized blocks that maintain pretrained knowledge. Experiments indicate that Nexusformer achieves Tokenformer's perplexity while consuming up to 41.5% less training compute. The findings were published on arXiv under the identifier arXiv:2604.19147v1.

Key facts

  • Nexusformer is a new transformer architecture designed for scalable growth
  • It replaces linear Q/K/V projections with a Nexus-Rank layer using nonlinear mapping
  • The architecture enables lossless structured growth through zero-initialized blocks
  • New capacity can be injected along two axes while preserving pretrained knowledge
  • Experiments show it matches Tokenformer's perplexity with up to 41.5% less training compute
  • Standard transformers struggle to expand without discarding learned representations
  • The primary bottleneck identified is in the attention mechanism's linear projections
  • Research was announced on arXiv with identifier arXiv:2604.19147v1

Entities

Institutions

  • arXiv

Sources