Dataset Distillation Creates Fairness Gaps Across Demographics

ai-technology · 2026-05-04

A new study reveals that dataset distillation, a technique for compressing large datasets into smaller synthetic ones, can introduce significant fairness gaps across demographic groups. The research, published on arXiv (2605.00185), demonstrates that models trained on distilled data perform poorly for certain subgroups due to mismatches in predictive patterns, not just sample size imbalance. The authors propose a solution using a group-imbalance-agnostic barycenter to align representations across groups.

Key facts

Dataset distillation compresses large datasets into small synthetic ones while maintaining predictive performance.
Different demographic groups exhibit distinct predictive patterns.
Distillation struggles to preserve informative signals for all subgroups, regardless of group size balance.
Models trained on distilled data can experience substantial performance drops for certain subgroups.
Fairness gaps do not disappear by merely correcting group imbalance.
Gaps stem from fundamental mismatches in subgroup predictive patterns, not sample-size disparities alone.
The study formally analyzes the interaction between two sources of bias.
The solution involves identifying a group-imbalance-agnostic barycenter of predictive information.

Dataset Distillation Creates Fairness Gaps Across Demographics

Key facts

Entities

Institutions

Sources