ARTFEED — Contemporary Art Intelligence

Complement Submodular Information for Data Selection

other · 2026-05-26

A new class of submodular functions, Complement Submodular Information (CSI), has been introduced to improve data selection by explicitly preserving structural information between a selected subset and its complement. Classical submodular objectives optimize only the selected subset, ignoring the remaining data. CSI addresses this limitation by quantifying shared structural information across both subsets. This approach is particularly relevant for modern machine learning applications such as train/validation/test splitting, benchmark construction, and robust subset selection, where balanced structure is critical. The framework induces complement-aware variants of several classical submodular functions, enhancing coverage, diversity, and representativeness while maintaining balance. The work is detailed in arXiv preprint 2605.24779.

Key facts

  • Complement Submodular Information (CSI) is a new class of complement-aware submodular objectives.
  • CSI quantifies shared structural information between a subset and its complement.
  • Classical submodular objectives optimize only the selected subset.
  • CSI addresses limitations in train/validation/test splitting, benchmark construction, and robust subset selection.
  • The framework induces complement-aware variants of several classical submodular functions.
  • The work is published on arXiv with ID 2605.24779.
  • CSI aims to preserve balanced structure across both selected and remaining data.
  • The approach is relevant for modern machine learning applications.

Entities

Institutions

  • arXiv

Sources