ARTFEED — Contemporary Art Intelligence

LiBaGS: Lightweight Boundary Gap Synthesis for Targeted Synthetic Data Selection

other · 2026-05-13

LiBaGS is an efficient, generator-independent approach for the selective generation of synthetic training data. It evaluates potential synthetic samples by integrating factors such as proximity to decision boundaries, predictive uncertainty, density of real data, and validity of support, ensuring that the chosen samples are both informative and likely to reside within the real data manifold. The method employs a boundary-gap allocation strategy that focuses on sparse yet realistic decision-boundary areas, rather than merely increasing data volume or choosing only the most uncertain samples. Additionally, LiBaGS identifies when sufficient synthetic samples have been incorporated through a marginal-value stopping criterion, applies softer labels near ambiguous boundaries, and incorporates a diversity objective to prevent redundant selections. Experiments indicate that LiBaGS enhances model performance by effectively addressing gaps in the training distribution.

Key facts

  • LiBaGS is a lightweight, generator-agnostic method for targeted synthetic data selection.
  • It scores samples using decision-boundary proximity, predictive uncertainty, real-data density, and support validity.
  • Uses a boundary-gap allocation rule targeting sparse but realistic decision-boundary neighborhoods.
  • Includes a marginal-value stopping rule to determine when enough synthetic samples have been added.
  • Assigns softer labels near ambiguous boundaries.
  • Uses a diversity objective to avoid redundant near-duplicate selections.
  • Experiments show LiBaGS improves model performance.
  • Method is designed to fill missing parts of the training distribution relevant to the downstream task.

Entities

Sources