ARTFEED — Contemporary Art Intelligence

COFT: Counterfactual-Conformal Decoding for Fair Chain-of-Thought Reasoning in LLMs

ai-technology · 2026-06-01

Researchers at arXiv have introduced a novel decoding approach named COFT (Chain of Fair Thought), which aims to minimize societal biases in large language models during chain-of-thought reasoning without requiring training. This method implements token-level fairness control at the decoding stage, ensuring distribution-free marginal validity under exchangeability for any fixed causal language model. COFT functions through three phases: generating a masked counterfactual prompt by substituting sensitive spans with neutral tokens, utilizing lightweight logit fusion to compare factual and masked logit distributions for bias reduction, and applying dual-branch split-conformal calibration to validate candidate token sets at a specified risk level. Tested on six models and various bias benchmarks, COFT achieves a 30-55% reduction in standard bias metrics (median 38%) while maintaining task utility and language quality. Reasoning accuracies are preserved. The research is accessible on arXiv with the identifier 2605.30641.

Key facts

  • COFT is a training-free decoding method for fair chain-of-thought reasoning.
  • It applies token-level fairness control at decode time.
  • Provides distribution-free marginal validity guarantees under exchangeability.
  • Works with any frozen causal language model.
  • Operates in three stages: counterfactual masking, logit fusion, and conformal calibration.
  • Reduces bias metrics by 30-55% (median 38%).
  • Preserves task utility and language quality.
  • Evaluated across six models and multiple bias benchmarks.

Entities

Institutions

  • arXiv

Sources