Supply-Chain Backdoors Enable Secret Theft in Local LLM Fine-Tuning

other · 2026-05-01

A new study reveals that compromised model code, not just pretrained weights, can steal secrets from local fine-tuning datasets. Researchers demonstrate a deterministic full-chain memorization mechanism that locks onto token-level secrets via online tensor-rule matching and value-gradient decoupling. This shifts the attack paradigm from passive weight poisoning to active execution hijacking, exploiting overlooked supply-chain vectors where model code is camouflaged as standard architectural definitions. The attack targets sensitive data like API keys and financial records, which passive poisoning fails to capture due to their sparse, high-entropy nature.

Key facts

Local fine-tuning datasets contain sensitive secrets such as API keys, personal identifiers, and financial records.
Compromised model code is sufficient to steal secrets from local fine-tuning.
Current passive pretrained-weight poisoning attacks fail to capture sparse high-entropy targets.
Attack exploits supply-chain vector: model code camouflaged as standard architectural definitions.
Introduces deterministic full-chain memorization mechanism.
Mechanism locks onto token-level secrets via online tensor-rule matching.
Uses value-gradient decoupling to stealthily inject.
Paradigm shift from passive weight poisoning to active execution hijacking.

Entities

—

Sources

arXiv cs.AI — 2026-05-01