VAC Framework Uses Natural Language Feedback for Personalized QA
Researchers have introduced VAC, a novel framework for personalized question answering that replaces scalar reward signals with natural language feedback (NLF). Current personalization methods for large language models (LLMs) rely on retrieval-augmented generation (RAG) and reinforcement learning with scalar rewards, which the authors argue provide weak, non-instructive feedback that limits learning efficiency. VAC generates NLF conditioned on user profiles and question narratives, offering richer and more actionable supervision. This allows the policy model to iteratively refine outputs and internalize effective personalization strategies. The work is described in a paper on arXiv (2508.10695) and aims to improve both effectiveness and user satisfaction in information-seeking tasks.
Key facts
- VAC framework replaces scalar rewards with natural language feedback for personalized QA.
- Current methods use RAG and reinforcement learning with scalar rewards.
- Scalar rewards are described as weak and non-instructive.
- NLF is conditioned on user profiles and question narratives.
- NLF provides rich, actionable supervision signals.
- The policy model iteratively refines outputs using NLF.
- The paper is available on arXiv with ID 2508.10695.
- Personalization aims to improve effectiveness and user satisfaction.
Entities
Institutions
- arXiv