ARTFEED — Contemporary Art Intelligence

CroCo: Cross-Lingual Contrastive Preference Tuning on Self-Generations

ai-technology · 2026-05-27

A new method called CroCo extends contrastive preference tuning to multiple languages without requiring language-specific preference annotations. Using a reward model trained on English preferences atop a multilingual base, CroCo produces useful within-language rankings across 14 high and low-resource languages. The approach improves performance on most setups while preventing catastrophic forgetting of supervised fine-tuning. Gains depend on on-policy data; off-policy responses reduce benefits and online preference optimization fails.

Key facts

  • CroCo extends contrastive preference tuning to multiple languages.
  • No language-specific preference annotation is needed.
  • Reward model trained on English preferences atop multilingual base.
  • Evaluated on 14 high and low-resource languages.
  • Improves performance on majority of setups.
  • Prevents catastrophic forgetting of supervised fine-tuning.
  • Gains require on-policy data.
  • Off-policy responses reduce benefit; online preference optimization fails.

Entities

Sources