Quantum LLMs Achieve 1.4% Perplexity Improvement on IBM Hardware
A recent preprint on arXiv reveals the first effective combination of quantum computing and large language models (LLMs) through the use of Cayley-parameterised unitary adapters. These quantum circuit components were integrated into the frozen projection layers of pre-trained LLMs and tested on a 156-qubit IBM Quantum System Two superconducting processor. This approach enhanced the perplexity of the Llama 3.1 8B model, which has 8 billion parameters, by 1.4% with just 6,000 extra parameters, and end-to-end inference was confirmed on a real Quantum Processing Unit (QPU). Additionally, a comprehensive analysis of SmolLM2 (135M parameters) indicated that increasing unitary depth consistently improved perplexity, addressing the limitations of classical memory scaling with model size and paving a new route through quantum computing.
Key facts
- Cayley-parameterised unitary adapters are quantum circuit blocks inserted into frozen projection layers of pre-trained LLMs.
- Executed on a 156-qubit IBM Quantum System Two superconducting processor.
- Improved perplexity of Llama 3.1 8B by 1.4% with only 6,000 additional parameters.
- End-to-end inference validated on real Quantum Processing Unit (QPU).
- Systematic study on SmolLM2 (135M parameters) showed monotonically improving perplexity with increasing unitary depth.
- Classical architectures require memory that scales unfavourably with model size.
- Quantum computing offers a qualitatively different pathway for LLMs.
- Practical demonstrations on real hardware for models of practical relevance have previously remained elusive.
Entities
Institutions
- IBM Quantum System Two
- arXiv