Effective data generation for imbalanced learning using conditional generative adversarial networks

Research output: Contribution to journalArticle

45 Citations (Scopus)

Abstract

Learning from imbalanced datasets is a frequent but challenging task for standard classification algorithms. Although there are different strategies to address this problem, methods that generate artificial data for the minority class constitute a more general approach compared to algorithmic modifications. Standard oversampling methods are variations of the SMOTE algorithm, which generates synthetic samples along the line segment that joins minority class samples. Therefore, these approaches are based on local information, rather on the overall minority class distribution. Contrary to these algorithms, in this paper the conditional version of Generative Adversarial Networks (cGAN) is used to approximate the true data distribution and generate data for the minority class of various imbalanced datasets. The performance of cGAN is compared against multiple standard oversampling algorithms. We present empirical results that show a significant improvement in the quality of the generated data when cGAN is used as an oversampling algorithm.

Original languageEnglish
Pages (from-to)464-471
Number of pages8
JournalExpert Systems with Applications
Volume91
DOIs
Publication statusPublished - 1 Jan 2018

Keywords

  • Artificial data
  • GAN
  • Imbalanced learning
  • Minority class

Cite this

@article{db258d9f0a9a4c8c8ec429fa7ded5796,
title = "Effective data generation for imbalanced learning using conditional generative adversarial networks",
abstract = "Learning from imbalanced datasets is a frequent but challenging task for standard classification algorithms. Although there are different strategies to address this problem, methods that generate artificial data for the minority class constitute a more general approach compared to algorithmic modifications. Standard oversampling methods are variations of the SMOTE algorithm, which generates synthetic samples along the line segment that joins minority class samples. Therefore, these approaches are based on local information, rather on the overall minority class distribution. Contrary to these algorithms, in this paper the conditional version of Generative Adversarial Networks (cGAN) is used to approximate the true data distribution and generate data for the minority class of various imbalanced datasets. The performance of cGAN is compared against multiple standard oversampling algorithms. We present empirical results that show a significant improvement in the quality of the generated data when cGAN is used as an oversampling algorithm.",
keywords = "Artificial data, GAN, Imbalanced learning, Minority class",
author = "Georgios Douzas and Fernando Ba{\cc}{\~a}o",
note = "Douzas, G., & Ba{\cc}{\~a}o, F. (2018). Effective data generation for imbalanced learning using conditional generative adversarial networks. Expert Systems with Applications, 91, 464-471. DOI: 10.1016/j.eswa.2017.09.030",
year = "2018",
month = "1",
day = "1",
doi = "10.1016/j.eswa.2017.09.030",
language = "English",
volume = "91",
pages = "464--471",
journal = "Expert Systems with Applications",
issn = "0957-4174",
publisher = "Elsevier Science B.V., Amsterdam.",

}

TY - JOUR

T1 - Effective data generation for imbalanced learning using conditional generative adversarial networks

AU - Douzas, Georgios

AU - Bação, Fernando

N1 - Douzas, G., & Bação, F. (2018). Effective data generation for imbalanced learning using conditional generative adversarial networks. Expert Systems with Applications, 91, 464-471. DOI: 10.1016/j.eswa.2017.09.030

PY - 2018/1/1

Y1 - 2018/1/1

N2 - Learning from imbalanced datasets is a frequent but challenging task for standard classification algorithms. Although there are different strategies to address this problem, methods that generate artificial data for the minority class constitute a more general approach compared to algorithmic modifications. Standard oversampling methods are variations of the SMOTE algorithm, which generates synthetic samples along the line segment that joins minority class samples. Therefore, these approaches are based on local information, rather on the overall minority class distribution. Contrary to these algorithms, in this paper the conditional version of Generative Adversarial Networks (cGAN) is used to approximate the true data distribution and generate data for the minority class of various imbalanced datasets. The performance of cGAN is compared against multiple standard oversampling algorithms. We present empirical results that show a significant improvement in the quality of the generated data when cGAN is used as an oversampling algorithm.

AB - Learning from imbalanced datasets is a frequent but challenging task for standard classification algorithms. Although there are different strategies to address this problem, methods that generate artificial data for the minority class constitute a more general approach compared to algorithmic modifications. Standard oversampling methods are variations of the SMOTE algorithm, which generates synthetic samples along the line segment that joins minority class samples. Therefore, these approaches are based on local information, rather on the overall minority class distribution. Contrary to these algorithms, in this paper the conditional version of Generative Adversarial Networks (cGAN) is used to approximate the true data distribution and generate data for the minority class of various imbalanced datasets. The performance of cGAN is compared against multiple standard oversampling algorithms. We present empirical results that show a significant improvement in the quality of the generated data when cGAN is used as an oversampling algorithm.

KW - Artificial data

KW - GAN

KW - Imbalanced learning

KW - Minority class

UR - http://www.scopus.com/inward/record.url?scp=85029534844&partnerID=8YFLogxK

U2 - 10.1016/j.eswa.2017.09.030

DO - 10.1016/j.eswa.2017.09.030

M3 - Article

VL - 91

SP - 464

EP - 471

JO - Expert Systems with Applications

JF - Expert Systems with Applications

SN - 0957-4174

ER -