ARTFEED — Contemporary Art Intelligence

Quantum LLMs Achieve 1.4% Perplexity Improvement on IBM Hardware

ai-technology · 2026-05-09

A recent preprint on arXiv reveals the first effective combination of quantum computing and large language models (LLMs) through the use of Cayley-parameterised unitary adapters. These quantum circuit components were integrated into the frozen projection layers of pre-trained LLMs and tested on a 156-qubit IBM Quantum System Two superconducting processor. This approach enhanced the perplexity of the Llama 3.1 8B model, which has 8 billion parameters, by 1.4% with just 6,000 extra parameters, and end-to-end inference was confirmed on a real Quantum Processing Unit (QPU). Additionally, a comprehensive analysis of SmolLM2 (135M parameters) indicated that increasing unitary depth consistently improved perplexity, addressing the limitations of classical memory scaling with model size and paving a new route through quantum computing.

Key facts

  • Cayley-parameterised unitary adapters are quantum circuit blocks inserted into frozen projection layers of pre-trained LLMs.
  • Executed on a 156-qubit IBM Quantum System Two superconducting processor.
  • Improved perplexity of Llama 3.1 8B by 1.4% with only 6,000 additional parameters.
  • End-to-end inference validated on real Quantum Processing Unit (QPU).
  • Systematic study on SmolLM2 (135M parameters) showed monotonically improving perplexity with increasing unitary depth.
  • Classical architectures require memory that scales unfavourably with model size.
  • Quantum computing offers a qualitatively different pathway for LLMs.
  • Practical demonstrations on real hardware for models of practical relevance have previously remained elusive.

Entities

Institutions

  • IBM Quantum System Two
  • arXiv

Sources