Geometric SMOTE for imbalanced datasets with nominal and continuous features

Research output: Contribution to journalArticlepeer-review

17 Citations (Scopus)
60 Downloads (Pure)

Abstract

Imbalanced learning can be addressed in 3 different ways: Resampling, algorithmic modifications and cost-sensitive solutions. Resampling, and specifically oversampling, are more general approaches when opposed to algorithmic and cost-sensitive methods. Since the proposal of the Synthetic Minority Oversampling TEchnique (SMOTE), various SMOTE variants and neural network-based oversampling methods have been developed. However, the options to oversample datasets with nominal and continuous features are limited. We propose Geometric SMOTE for Nominal and Continuous features (G-SMOTENC), based on a combination of G-SMOTE and SMOTENC. Our method modifies SMOTENC’s encoding and generation mechanism for nominal features while using G-SMOTE’s data selection mechanism to determine the center observation and k-nearest neighbors and generation mechanism for continuous features. G-SMOTENC’s performance is compared against SMOTENC’s along with two other baseline methods, a State-of-the-art oversampling method and no oversampling. The experiment was performed over 20 datasets with varying imbalance ratios, number of metric and non-metric features and target classes. We found a significant improvement in classification performance when using G-SMOTENC as the oversampling method. An open-source implementation of G-SMOTENC is made available in the Python programming language.
Original languageEnglish
Article number121053
Pages (from-to)1-9
Number of pages9
JournalExpert Systems with Applications
Volume234
Issue numberDecember
DOIs
Publication statusPublished - 30 Dec 2023

Keywords

  • Imbalanced learning
  • Oversampling
  • SMOTE
  • Data generation
  • Nominal data

Fingerprint

Dive into the research topics of 'Geometric SMOTE for imbalanced datasets with nominal and continuous features'. Together they form a unique fingerprint.

Cite this