Self-Organizing Map Oversampling (SOMO) for imbalanced data set learning

Research output: Contribution to journalArticlepeer-review

91 Citations (Scopus)


Learning from imbalanced datasets is challenging for standard algorithms, as they are designed to work with balanced class distributions. Although there are different strategies to tackle this problem, methods that address the problem through the generation of artificial data constitute a more general approach compared to algorithmic modifications. Specifically, they generate artificial data that can be used by any algorithm, not constraining the options of the user. In this paper, we present a new oversampling method, Self-Organizing Map-based Oversampling (SOMO), which through the application of a Self Organizing Map produces a two dimensional representation of the input space, allowing for an effective generation of artificial data points. SOMO comprises three major stages: Initially a Self-Organizing Map produces a two-dimensional representation of the original, usually high-dimensional, space. Next it generates within-cluster synthetic samples and finally it generates between cluster synthetic samples. Additionally we present empirical results that show the improvement in the performance of algorithms, when artificial data generated by SOMO are used, and also show that our method outperforms various oversampling methods.
Original languageEnglish
Pages (from-to)40-52
Number of pages13
JournalExpert Systems with Applications
Publication statusPublished - 1 Oct 2017


Dive into the research topics of 'Self-Organizing Map Oversampling (SOMO) for imbalanced data set learning'. Together they form a unique fingerprint.

Cite this