LLM Agents Discover Novel Neural Architectures Beyond Transformers
A recent preprint on arXiv has introduced AIRA-Compose and AIRA-Design, a groundbreaking dual-framework for autonomous neural architecture search using large language model (LLM) agents. In just 24 hours, AIRA-Compose employs 11 agents to explore computational primitives, evaluating candidates with millions of parameters and optimizing designs for scales of 350M, 1B, and 3B. This effort yields 14 unique architectures, divided into two groups: AIRAformers (based on Transformers) and AIRAhybrids (Transformer-Mamba). Models pre-trained at 1B scale outperform benchmarks set by Llama 3.2 and Composer. Notably, AIRAformer-D and AIRAhybrid-D boost accuracy by 2.4% and 3.8%, while AIRAformer-C operates 54% and 71% faster than Llama 3.2 and Composer's leading Transformer, showcasing strides in AI self-improvement.
Key facts
- AIRA-Compose uses 11 agents for high-level architecture search under a 24-hour budget.
- AIRA-Design handles low-level mechanistic implementation.
- 14 architectures discovered across AIRAformer and AIRAhybrid families.
- Models pre-trained at 1B scale outperform Llama 3.2 and Composer baselines.
- AIRAformer-D improves accuracy by 2.4% over Llama 3.2.
- AIRAhybrid-D improves accuracy by 3.8% over Llama 3.2.
- AIRAformer-C scales 54% faster than Llama 3.2.
- AIRAformer-C scales 71% faster than Composer's best Transformer.
Entities
Institutions
- arXiv
- Llama 3.2
- Composer