ARTFEED — Contemporary Art Intelligence

Antidistillation Fingerprinting Detects LLM Model Theft

ai-technology · 2026-05-18

Researchers have developed antidistillation fingerprinting (ADFP), a new method to detect when a third-party student model has been trained on a frontier large language model's outputs without authorization. Existing fingerprinting techniques rely on heuristic perturbations that degrade generation quality to ensure the fingerprint is internalized by the student. ADFP aligns the fingerprinting objective with the student's learning dynamics, using a proxy model to identify and sample tokens that maximize detectability after fine-tuning, avoiding the quality trade-off. The approach builds on the gradient-based framework of antidistillation sampling. The paper was published on arXiv under identifier 2602.03812v2.

Key facts

  • ADFP detects distillation of frontier LLMs
  • Existing methods degrade generation quality
  • ADFP uses a proxy model to sample tokens
  • Maximizes detectability after fine-tuning
  • Based on antidistillation sampling framework
  • Published on arXiv:2602.03812v2
  • Avoids steep trade-off between quality and fingerprinting strength
  • Aligns fingerprinting with student learning dynamics

Entities

Institutions

  • arXiv

Sources