ARTFEED — Contemporary Art Intelligence

PennyLang Dataset Released to Improve LLM-Based Quantum Code Generation

ai-technology · 2026-04-20

To tackle the challenges associated with utilizing large language models for quantum software development, a new dataset named PennyLang has been launched. This dataset features 3,347 quantum code samples specific to PennyLane, complete with contextual descriptions, sourced from textbooks, official documentation, and open-source repositories. PennyLang is designed to function as both a training dataset for LLMs and a dependable resource for quantum programming tasks. Its development includes three key components: the dataset itself, an automated framework for creating quantum code datasets, and baseline evaluations across various open models. Released as open-source, this resource aims to enhance research and development in quantum computing. The findings are detailed in arXiv preprint 2503.02497v4.

Key facts

  • PennyLang dataset contains 3,347 PennyLane-specific quantum code samples
  • Dataset curated from textbooks, official documentation, and open-source repositories
  • Designed to improve LLM-based quantum code generation
  • Includes contextual descriptions for code samples
  • Released as open-source resource
  • Framework enables automated quantum code dataset construction
  • Addresses lack of high-quality datasets for quantum software development
  • Research documented in arXiv preprint 2503.02497v4

Entities

Institutions

  • arXiv

Sources