ARTFEED — Contemporary Art Intelligence

Counterfactual Likelihood Test Measures Influence in Private Reasoning Channels

ai-technology · 2026-05-20

A recent technique detailed in arXiv:2605.19092 employs counterfactual likelihood tests to assess indirect influence among private reasoning pathways in AI systems. This method substitutes an upstream private block with a donor block of equivalent length while keeping public tokens and the downstream target constant, subsequently evaluating the negative-log-likelihood shift. Testing on a 7B role-channel reasoning model reveals that textual probes lack reliability: the raw n-gram overlap exaggerates leakage, the adjusted overlap remains inconsistent, and canary reproduction does not effectively differentiate. Counterfactual likelihood distinguishes between masked and unmasked conditions, with length matching addressing a RoPE positional confound.

Key facts

  • Method uses counterfactual likelihood test for measuring influence between private reasoning channels
  • Replaces upstream private block with length-matched donor block
  • Holds public token sequence and downstream target fixed
  • Measures downstream target's negative-log-likelihood shift
  • Validated on a 7B role-channel reasoning model
  • Textual probes are unreliable: raw n-gram overlap overstates leakage
  • Corrected overlap remains noisy
  • Canary reproduction reports no discrimination

Entities

Institutions

  • arXiv

Sources