Antidistillation Fingerprinting Detects LLM Model Theft

ai-technology · 2026-05-18

Researchers have developed antidistillation fingerprinting (ADFP), a new method to detect when a third-party student model has been trained on a frontier large language model's outputs without authorization. Existing fingerprinting techniques rely on heuristic perturbations that degrade generation quality to ensure the fingerprint is internalized by the student. ADFP aligns the fingerprinting objective with the student's learning dynamics, using a proxy model to identify and sample tokens that maximize detectability after fine-tuning, avoiding the quality trade-off. The approach builds on the gradient-based framework of antidistillation sampling. The paper was published on arXiv under identifier 2602.03812v2.

Key facts

ADFP detects distillation of frontier LLMs
Existing methods degrade generation quality
ADFP uses a proxy model to sample tokens
Maximizes detectability after fine-tuning
Based on antidistillation sampling framework
Published on arXiv:2602.03812v2
Avoids steep trade-off between quality and fingerprinting strength
Aligns fingerprinting with student learning dynamics

Antidistillation Fingerprinting Detects LLM Model Theft

Key facts

Entities

Institutions

Sources