Targeted Downstream-Agnostic Attack on Pre-Trained Encoders
A new paper on arXiv (2605.19446v1) introduces a Targeted Downstream-Agnostic Attack (TDAA) method against pre-trained encoders. Unlike existing downstream-agnostic attacks (DAAs) that only require changing the original prediction, TDAA operates under a stricter threat model: the attack must be both targeted and downstream-agnostic. The key challenge is that the downstream task is unknown and encoders do not produce direct predictions. To solve this, the authors propose a 'threat image' pre-selected by the attacker as the target. A generator creates example-specific adversarial perturbations that force the victim encoder to produce representations similar to those of the threat image, thereby achieving a targeted effect without knowledge of the downstream task. This work tightens the security assumptions for encoder-based systems.
Key facts
- Paper arXiv:2605.19446v1 introduces Targeted Downstream-Agnostic Attack (TDAA).
- TDAA requires both targeted and downstream-agnostic adversarial examples.
- Existing DAA methods only require changing the original prediction.
- The method uses a pre-selected 'threat image' as the target.
- A generator produces example-specific adversarial perturbations.
- The attack compels the victim encoder to output representations similar to the threat image.
- The downstream task remains unknown to the attacker.
Entities
Institutions
- arXiv