Abstract
Addressing the challenge of class imbalance in binary classification, this paper introduces Genetic Methods for OverSampling (GM4OS), an innovative technique leveraging the combined capabilities of Genetic Algorithms (GAs) and Genetic Programming (GP). Traditional oversampling methods like SMOTE and its variants depend on the selected data points and fixed synthetic data generation processes, often leading to suboptimal results. GM4OS advances this field by simultaneously evolving a resampling set and a synthetic data generation function. Individuals in GM4OS are made of two components, the GA component selects minority class observations for resampling, while the GP component evolves functions to create synthetic observations. This dual evolution process aims to optimize both the selection of data points and the creation of synthetic samples, enhancing the performance of classifiers on imbalanced datasets. We studied the performance of GM4OS across ten different test datasets and against five oversampling approaches commonly used in the literature. The results highlight how GM4OS is able to outperform the baseline methods in three out of ten test datasets, improving the algorithm performance.
Original language | English |
---|---|
Article number | 510 |
Number of pages | 12 |
Journal | SN Computer Science |
Volume | 6 |
Issue number | 5 |
Early online date | 30 May 2025 |
DOIs | |
Publication status | Published - Jun 2025 |
Keywords
- Oversampling
- Imbalanced data
- Genetic programming
- Genetic algorithms