RULER: New Metrics Reveal Machine Unlearning Fails at Representation Level

ai-technology · 2026-05-28

A new set of metrics for representation-level verification in machine unlearning, named RULER, has been developed by researchers, indicating that existing output-level assessments fall short. The findings, detailed in a study on arXiv (2605.27569), demonstrate that even when models succeed in membership inference and maintain accuracy, they may still retain forgotten data in their intermediate representations. RULER comprises two metrics: M2, which assesses forget-set representations against a retrained model, and M4, which identifies residuals without the need for retraining. In tests of four approximate unlearning techniques, M2 revealed significant residuals in 10 out of 12 scenarios (p<0.05), with larger effect sizes correlating with increased forget fractions. This research underscores a vital deficiency in unlearning verification and advocates for a more stringent standard.

Key facts

Machine unlearning aims to remove training record influence without retraining.
Current verification checks output-level metrics: membership inference, retain accuracy, forget-set accuracy.
Models can pass all three while still encoding forgotten records in intermediate representations.
RULER introduces representation-level verification metrics.
M2 is an oracle-comparative metric measuring forget-set representational position.
M4 is an oracle-free metric detecting residuals from internal similarity structure.
Four approximate unlearning methods passed output-level evaluation.
M2 detected significant residuals in 10 of 12 conditions (p<0.05).

RULER: New Metrics Reveal Machine Unlearning Fails at Representation Level

Key facts

Entities

Institutions

Sources