ARTFEED — Contemporary Art Intelligence

Systematic Survey of Data Balancing Methods for Imbalanced Datasets

publication · 2026-04-30

A thorough systematic review focusing on strategies for balancing imbalanced datasets has been released on arXiv. This paper examines foundational oversampling methods such as SMOTE and its variations (Borderline SMOTE, K-Means SMOTE, Safe-Level SMOTE), as well as advanced adaptive techniques (MWMOTE, AMDO), deep generative models (GANs, VAEs, diffusion models), undersampling methods (NearMiss, Tomek Links), hybrid approaches (SMOTE-ENN, SMOTE-Tomek, SMOTE+OCSVM), ensemble techniques (SMOTEBoost, RUSBoost, Balanced Random Forest, One-Sided Selection), and specialized methods for multi-label and clustered data. The research tackles the ongoing issue of class imbalance, which skews predictions towards majority classes and negatively impacts classifier performance.

Key facts

  • The paper is a systematic survey of data balancing methods.
  • It covers SMOTE and its variants: Borderline SMOTE, K-Means SMOTE, Safe-Level SMOTE.
  • Advanced adaptive methods include MWMOTE and AMDO.
  • Deep generative models include GANs, VAEs, and diffusion models.
  • Undersampling techniques include NearMiss and Tomek Links.
  • Hybrid methods include SMOTE-ENN, SMOTE-Tomek, and SMOTE+OCSVM.
  • Ensemble strategies include SMOTEBoost, RUSBoost, Balanced Random Forest, and One-Sided Selection.
  • Specialized approaches for multi-label and clustered data are covered.

Entities

Institutions

  • arXiv

Sources