FAGER Framework Evaluates Factual Accuracy in Text-to-Image Models
Researchers have introduced FAGER (Factually Grounded Evaluation and Refinement), an agentic framework designed to assess whether text-to-image (T2I) models generate images that correctly reflect visually verifiable facts. Existing evaluation metrics primarily check alignment with explicitly stated information in prompts but fail to capture implicit, externally grounded, or identity-defining factual requirements. FAGER addresses this gap by constructing a structured factual rubric that combines LLM-based fact proposal with reference-guided visual fact extraction and verification. The rubric is then converted into question-answer pairs for VLM-based evaluation. The framework also provides actionable feedback for improvement. This development is particularly relevant for prompts involving scientific knowledge, historical facts, products, or culture-specific concepts, where factual correctness is critical. The paper is available on arXiv under identifier 2605.19111.
Key facts
- FAGER stands for Factually Grounded Evaluation and Refinement.
- It is an agentic framework for evaluating factual correctness in T2I models.
- Existing metrics fail to capture implicit, externally grounded, or identity-defining facts.
- FAGER uses LLM-based fact proposal and reference-guided visual fact extraction.
- The rubric is converted into question-answer pairs for VLM-based evaluation.
- It provides actionable feedback for improvement.
- Relevant for prompts involving science, history, products, or culture.
- Paper available on arXiv: 2605.19111.
Entities
Institutions
- arXiv