CoFrGeNet: Continued Fraction Architectures for Language Generation
A new paper introduces CoFrGeNet (Continued Fraction Generative Networks), a novel architecture for language generation inspired by continued fractions. The architecture replaces Multi-head Attention and Feed-Forward Networks in Transformer blocks with fewer parameters. Custom gradient formulations optimize components more accurately than standard PyTorch gradients. The approach is a plug-in replacement requiring minimal changes to existing Transformer training or inference procedures, making it suitable for large industrial workflows. Experiments were conducted on two very different transformer architectures.
Key facts
- CoFrGeNet stands for Continued Fraction Generative Networks.
- The architecture is inspired by continued fractions.
- It replaces Multi-head Attention and Feed-Forward Networks in Transformer blocks.
- The new components require much fewer parameters.
- Custom gradient formulations are derived for optimization.
- The approach is a plug-in replacement for Transformer-based models.
- Experiments were conducted on two very different transformer architectures.
- The paper is available on arXiv with ID 2601.21766.
Entities
Institutions
- arXiv