Quantized LLM Performance in Qualitative Analysis Improved by Multi-Pass Prompt Verification

ai-technology · 2026-05-22

A study on arXiv (2605.20193) investigates how lower-bit quantization levels (8-bit, 4-bit, 3-bit, 2-bit) and types affect LLaMA-3.1 (8B) performance in qualitative analysis. Using 82 interview transcripts with expert and non-expert responses, low-bit models exhibit increased hallucinations and instability, particularly with non-expert language. The authors propose a quantization-aware multi-pass prompt verification method that guides the model through controlled steps to reduce hallucinations, removing unreliable content and passing verified results to the next transcript. Human coders using NVivo and BF16 LLaMA validated performance. The method improves accuracy for quantized models in qualitative tasks.

Key facts

Study examines quantization levels: 8-bit, 4-bit, 3-bit, 2-bit
Uses LLaMA-3.1 (8B) model
Data from 82 interview transcripts with expert and non-expert responses
Low-bit models produce higher hallucinations and unstable results
Proposes quantization-aware multi-pass prompt verification method
Method reduces hallucinations through controlled steps and verification
Validation by human coders using NVivo and BF16 LLaMA
arXiv paper ID: 2605.20193

Quantized LLM Performance in Qualitative Analysis Improved by Multi-Pass Prompt Verification

Key facts

Entities

Institutions

Sources