COFT: Counterfactual-Conformal Decoding for Fair Chain-of-Thought Reasoning in LLMs

ai-technology · 2026-06-01

Researchers at arXiv have introduced a novel decoding approach named COFT (Chain of Fair Thought), which aims to minimize societal biases in large language models during chain-of-thought reasoning without requiring training. This method implements token-level fairness control at the decoding stage, ensuring distribution-free marginal validity under exchangeability for any fixed causal language model. COFT functions through three phases: generating a masked counterfactual prompt by substituting sensitive spans with neutral tokens, utilizing lightweight logit fusion to compare factual and masked logit distributions for bias reduction, and applying dual-branch split-conformal calibration to validate candidate token sets at a specified risk level. Tested on six models and various bias benchmarks, COFT achieves a 30-55% reduction in standard bias metrics (median 38%) while maintaining task utility and language quality. Reasoning accuracies are preserved. The research is accessible on arXiv with the identifier 2605.30641.

Key facts

COFT is a training-free decoding method for fair chain-of-thought reasoning.
It applies token-level fairness control at decode time.
Provides distribution-free marginal validity guarantees under exchangeability.
Works with any frozen causal language model.
Operates in three stages: counterfactual masking, logit fusion, and conformal calibration.
Reduces bias metrics by 30-55% (median 38%).
Preserves task utility and language quality.
Evaluated across six models and multiple bias benchmarks.

COFT: Counterfactual-Conformal Decoding for Fair Chain-of-Thought Reasoning in LLMs

Key facts

Entities

Institutions

Sources