ARTFEED — Contemporary Art Intelligence

CDVM: Optimizing Data Pruning in Low-Data Environments

other · 2026-05-13

A new paper on arXiv introduces Constraint-Data-Value-Maximization (CDVM), a method for effective data pruning when only a small fraction of training data remains. The authors demonstrate that Shapley-based data values are suboptimal for pruning low-value data in low-data scenarios. CDVM frames pruning as a constrained optimization that maximizes total influence while penalizing excessive per-test contributions, achieving robust performance on the OpenDataVal benchmark.

Key facts

  • arXiv paper 2605.11312 introduces CDVM.
  • CDVM addresses data pruning in low-data environments.
  • Shapley-based data values are suboptimal for low-data pruning.
  • CDVM casts pruning as constrained optimization.
  • It maximizes total influence and penalizes per-test contributions.
  • CDVM shows strong performance on OpenDataVal benchmark.
  • The paper is from arXiv, published in 2025.
  • Data attribution is the broader research field.

Entities

Institutions

  • arXiv
  • OpenDataVal

Sources