ARTFEED — Contemporary Art Intelligence

Aligning Language Models with Online Natural Language Feedback

ai-technology · 2026-05-07

Scientists have created techniques to align language models in ambiguous areas where human specialists can offer excellent guidance, albeit for a limited number of outputs, through real-time natural language feedback. This strategy includes optimizing iteratively based on proxy reward signals, halting before over-optimization, gathering new expert supervision, and revising the proxy reward. Proxy reward models are developed from language models through in-context learning and fine-tuning. These methods were evaluated by assessing the creative writing skills of Qwen3-8B and the alignment research abilities of Haiku 4.5.

Key facts

  • The paper is arXiv:2605.04356v1.
  • Methods align language models in fuzzy domains with online natural language feedback.
  • Training involves iterative optimization against proxy reward signals.
  • Proxy reward models use in-context learning and fine-tuning.
  • Tests conducted on Qwen3-8B for creative writing.
  • Tests conducted on Haiku 4.5 for alignment research.
  • Human experts provide high-quality supervision for a small number of outputs.
  • The approach stops at the point of over-optimization.

Entities

Institutions

  • arXiv

Sources