QuantumKatas Adapted from Q# to Qiskit for LLM Benchmarking
Researchers have transformed Microsoft's QuantumKatas by transitioning it from Q# programming to Qiskit. They developed a comprehensive benchmark comprising 350 tasks across 26 distinct categories aimed at assessing large language models (LLMs). The tasks range from basic quantum gates to complex algorithms like Grover's, Simon's, and Deutsch-Jozsa, also addressing error correction, key distribution, and quantum games. Each task is designed with a natural language prompt, a verified answer, and involves deterministic testing through classical simulation. The evaluation encompassed 16 LLMs using 7 different prompting strategies, resulting in a total of 39,200 model assessments.
Key facts
- Adapted Microsoft's QuantumKatas from Q# to Qiskit
- 350 tasks across 26 categories
- Covers gates, Grover's, Simon's, Deutsch-Jozsa, error correction, key distribution, quantum games
- Each task has a natural language prompt, canonical solution, and deterministic test
- Built on QuantumKatas' pedagogical design
- Evaluated 16 LLMs across 7 prompting configurations
- Total of 39,200 model runs
- arXiv:2605.27210
Entities
Institutions
- Microsoft
- Qiskit