Targeted Downstream-Agnostic Attack on Pre-Trained Encoders

ai-technology · 2026-05-20

A new paper on arXiv (2605.19446v1) introduces a Targeted Downstream-Agnostic Attack (TDAA) method against pre-trained encoders. Unlike existing downstream-agnostic attacks (DAAs) that only require changing the original prediction, TDAA operates under a stricter threat model: the attack must be both targeted and downstream-agnostic. The key challenge is that the downstream task is unknown and encoders do not produce direct predictions. To solve this, the authors propose a 'threat image' pre-selected by the attacker as the target. A generator creates example-specific adversarial perturbations that force the victim encoder to produce representations similar to those of the threat image, thereby achieving a targeted effect without knowledge of the downstream task. This work tightens the security assumptions for encoder-based systems.

Key facts

Paper arXiv:2605.19446v1 introduces Targeted Downstream-Agnostic Attack (TDAA).
TDAA requires both targeted and downstream-agnostic adversarial examples.
Existing DAA methods only require changing the original prediction.
The method uses a pre-selected 'threat image' as the target.
A generator produces example-specific adversarial perturbations.
The attack compels the victim encoder to output representations similar to the threat image.
The downstream task remains unknown to the attacker.

Targeted Downstream-Agnostic Attack on Pre-Trained Encoders

Key facts

Entities

Institutions

Sources