Quantum LLMs Achieve 1.4% Perplexity Improvement on IBM Hardware

ai-technology · 2026-05-09

A recent preprint on arXiv reveals the first effective combination of quantum computing and large language models (LLMs) through the use of Cayley-parameterised unitary adapters. These quantum circuit components were integrated into the frozen projection layers of pre-trained LLMs and tested on a 156-qubit IBM Quantum System Two superconducting processor. This approach enhanced the perplexity of the Llama 3.1 8B model, which has 8 billion parameters, by 1.4% with just 6,000 extra parameters, and end-to-end inference was confirmed on a real Quantum Processing Unit (QPU). Additionally, a comprehensive analysis of SmolLM2 (135M parameters) indicated that increasing unitary depth consistently improved perplexity, addressing the limitations of classical memory scaling with model size and paving a new route through quantum computing.

Key facts

Cayley-parameterised unitary adapters are quantum circuit blocks inserted into frozen projection layers of pre-trained LLMs.
Executed on a 156-qubit IBM Quantum System Two superconducting processor.
Improved perplexity of Llama 3.1 8B by 1.4% with only 6,000 additional parameters.
End-to-end inference validated on real Quantum Processing Unit (QPU).
Systematic study on SmolLM2 (135M parameters) showed monotonically improving perplexity with increasing unitary depth.
Classical architectures require memory that scales unfavourably with model size.
Quantum computing offers a qualitatively different pathway for LLMs.
Practical demonstrations on real hardware for models of practical relevance have previously remained elusive.

Quantum LLMs Achieve 1.4% Perplexity Improvement on IBM Hardware

Key facts

Entities

Institutions

Sources