Systematic Survey of Data Balancing Methods for Imbalanced Datasets
A thorough systematic review focusing on strategies for balancing imbalanced datasets has been released on arXiv. This paper examines foundational oversampling methods such as SMOTE and its variations (Borderline SMOTE, K-Means SMOTE, Safe-Level SMOTE), as well as advanced adaptive techniques (MWMOTE, AMDO), deep generative models (GANs, VAEs, diffusion models), undersampling methods (NearMiss, Tomek Links), hybrid approaches (SMOTE-ENN, SMOTE-Tomek, SMOTE+OCSVM), ensemble techniques (SMOTEBoost, RUSBoost, Balanced Random Forest, One-Sided Selection), and specialized methods for multi-label and clustered data. The research tackles the ongoing issue of class imbalance, which skews predictions towards majority classes and negatively impacts classifier performance.
Key facts
- The paper is a systematic survey of data balancing methods.
- It covers SMOTE and its variants: Borderline SMOTE, K-Means SMOTE, Safe-Level SMOTE.
- Advanced adaptive methods include MWMOTE and AMDO.
- Deep generative models include GANs, VAEs, and diffusion models.
- Undersampling techniques include NearMiss and Tomek Links.
- Hybrid methods include SMOTE-ENN, SMOTE-Tomek, and SMOTE+OCSVM.
- Ensemble strategies include SMOTEBoost, RUSBoost, Balanced Random Forest, and One-Sided Selection.
- Specialized approaches for multi-label and clustered data are covered.
Entities
Institutions
- arXiv