Hidden-State Privacy Mechanisms Fail to Balance Utility and Security
A study on arXiv (2605.24042) reveals that no Gaussian release covariance among 1,536 tested achieves both moderate utility and privacy against adaptive retrieval attacks for single-layer hidden-state privacy. A Fisher-ball lower bound shows every full-rank Gaussian release with O(1) Fisher utility has a direction with linearly growing Mahalanobis signal, ruling out uniform safety. The diagonal inverse-Fisher release Σ⋆diag(K) = (2K/d) diag(1/F_ii) is minimax-optimal at first-order KL budget K and achieves worst-attacker top-1 ≤ 0.001 across 32 model layers, but sits on a privacy/utility edge. A generalized-eigen mechanism reaches 13× Pareto reduction under Euclidean retrieval but collapses to 1.
Key facts
- 1,536 Gaussian release covariances tested for single-layer hidden-state privacy
- Zero achieve both moderate utility and moderate privacy against adaptive retrieval attacker
- Fisher-ball lower bound: every full-rank Gaussian release at O(1) Fisher utility has linearly growing Mahalanobis signal
- Diagonal inverse-Fisher release Σ⋆diag(K) = (2K/d) diag(1/F_ii) is unique minimax-optimal diagonal mechanism at first-order KL budget K
- Worst-attacker top-1 ≤ 0.001 at every point of a 32 model-layer grid for the diagonal mechanism
- Generalized-eigen mechanism reaches 13× Pareto reduction under Euclidean retrieval
- Study published on arXiv with ID 2605.24042
- Empirical empty middle matches theoretical bound
Entities
Institutions
- arXiv