COFT: Counterfactual-Conformal Decoding for Fair Chain-of-Thought Reasoning in LLMs
Researchers at arXiv have introduced a novel decoding approach named COFT (Chain of Fair Thought), which aims to minimize societal biases in large language models during chain-of-thought reasoning without requiring training. This method implements token-level fairness control at the decoding stage, ensuring distribution-free marginal validity under exchangeability for any fixed causal language model. COFT functions through three phases: generating a masked counterfactual prompt by substituting sensitive spans with neutral tokens, utilizing lightweight logit fusion to compare factual and masked logit distributions for bias reduction, and applying dual-branch split-conformal calibration to validate candidate token sets at a specified risk level. Tested on six models and various bias benchmarks, COFT achieves a 30-55% reduction in standard bias metrics (median 38%) while maintaining task utility and language quality. Reasoning accuracies are preserved. The research is accessible on arXiv with the identifier 2605.30641.
Key facts
- COFT is a training-free decoding method for fair chain-of-thought reasoning.
- It applies token-level fairness control at decode time.
- Provides distribution-free marginal validity guarantees under exchangeability.
- Works with any frozen causal language model.
- Operates in three stages: counterfactual masking, logit fusion, and conformal calibration.
- Reduces bias metrics by 30-55% (median 38%).
- Preserves task utility and language quality.
- Evaluated across six models and multiple bias benchmarks.
Entities
Institutions
- arXiv