CompactQE: Efficient Translation Quality Estimation via Small LLMs
A study reveals that small open-source LLMs (under 30 billion parameters) can successfully execute quality estimation (QE) for machine translation, providing a privacy-friendly option compared to large proprietary models. By employing a single-pass prompting method, these models produce quality scores, MQM error annotations, recommended corrections, and complete post-edits. This technique demonstrates system-level correlations with human evaluations that surpass conventional neural metrics, fine-tuned models, and inter-annotator agreement among humans, closely matching the performance of significantly larger proprietary LLMs.
Key facts
- Small open-source LLMs (<30B parameters) are viable for QE
- Single-pass prompting generates quality scores, MQM annotations, corrections, and post-editions
- Outperforms traditional neural metrics, fine-tuned models, and human inter-annotator agreement
- Addresses data privacy concerns associated with proprietary LLMs
Entities
Institutions
- arXiv