An Empirical Study of GM4OS for Imbalanced Binary Classification

Research output: Contribution to journalArticlepeer-review

1 Downloads (Pure)

Abstract

Addressing the challenge of class imbalance in binary classification, this paper introduces Genetic Methods for OverSampling (GM4OS), an innovative technique leveraging the combined capabilities of Genetic Algorithms (GAs) and Genetic Programming (GP). Traditional oversampling methods like SMOTE and its variants depend on the selected data points and fixed synthetic data generation processes, often leading to suboptimal results. GM4OS advances this field by simultaneously evolving a resampling set and a synthetic data generation function. Individuals in GM4OS are made of two components, the GA component selects minority class observations for resampling, while the GP component evolves functions to create synthetic observations. This dual evolution process aims to optimize both the selection of data points and the creation of synthetic samples, enhancing the performance of classifiers on imbalanced datasets. We studied the performance of GM4OS across ten different test datasets and against five oversampling approaches commonly used in the literature. The results highlight how GM4OS is able to outperform the baseline methods in three out of ten test datasets, improving the algorithm performance.
Original languageEnglish
Article number510
Number of pages12
JournalSN Computer Science
Volume6
Issue number5
Early online date30 May 2025
DOIs
Publication statusPublished - Jun 2025

Keywords

  • Oversampling
  • Imbalanced data
  • Genetic programming
  • Genetic algorithms

Cite this