TY - JOUR
T1 - A classification method based on a cloud of spheres
AU - Dias, Tiago
AU - Amaral, Paula
N1 - info:eu-repo/grantAgreement/FCT/6817 - DCRRNI ID/UIDB%2F00297%2F2020/PT#
info:eu-repo/grantAgreement/FCT/6817 - DCRRNI ID/UIDP%2F00297%2F2020/PT#
Publisher Copyright:
© 2023 The Author(s)
PY - 2023
Y1 - 2023
N2 - In this article we propose a binary classification model to distinguish a specific class that corresponds to a characteristic that we intend to identify (fraud, spam, disease). The classification model is based on a cloud of spheres that circumscribes the points of the class to be identified. It is intended to build a model based on a cloud and not on a disjoint set of clouds, establishing this condition on the connectivity of a graph induced by the spheres. To solve the problem, designed by a Cloud of Connected Spheres, a quadratic model with continuous and binary variables (MINLP) is proposed with the minimization of the number of spheres. The issue of connectivity implies in many models the imposition of an exponential number of constraints. However, because of the specific conditions of the problem under study, connectivity is enforced with linear constraints that scale quadratically with K, which serves as an upper bound on the number of spheres. This classification model is effective when the structure of the class to be identified is highly non-linear and non-convex, also adapting to the case of linear separation. Unlike neural networks, the classification model is transparent, with the structure perfectly identified. No kernel functions are used and it is not necessary to use meta-parameters unless it is intended also to maximize the separation margin as it is done in SVM. Finding the global optima for large instances is quite challenging, and to address this, a heuristic is proposed. The heuristic demonstrates nice results on a set of frequently tested real problems when compared to state-of-the-art algorithms.
AB - In this article we propose a binary classification model to distinguish a specific class that corresponds to a characteristic that we intend to identify (fraud, spam, disease). The classification model is based on a cloud of spheres that circumscribes the points of the class to be identified. It is intended to build a model based on a cloud and not on a disjoint set of clouds, establishing this condition on the connectivity of a graph induced by the spheres. To solve the problem, designed by a Cloud of Connected Spheres, a quadratic model with continuous and binary variables (MINLP) is proposed with the minimization of the number of spheres. The issue of connectivity implies in many models the imposition of an exponential number of constraints. However, because of the specific conditions of the problem under study, connectivity is enforced with linear constraints that scale quadratically with K, which serves as an upper bound on the number of spheres. This classification model is effective when the structure of the class to be identified is highly non-linear and non-convex, also adapting to the case of linear separation. Unlike neural networks, the classification model is transparent, with the structure perfectly identified. No kernel functions are used and it is not necessary to use meta-parameters unless it is intended also to maximize the separation margin as it is done in SVM. Finding the global optima for large instances is quite challenging, and to address this, a heuristic is proposed. The heuristic demonstrates nice results on a set of frequently tested real problems when compared to state-of-the-art algorithms.
KW - Anomaly detection
KW - Automatic classification
KW - MINLP
KW - Spherical separation
UR - http://www.scopus.com/inward/record.url?scp=85174732123&partnerID=8YFLogxK
U2 - 10.1016/j.ejco.2023.100077
DO - 10.1016/j.ejco.2023.100077
M3 - Article
AN - SCOPUS:85174732123
SN - 2192-4406
VL - 11
JO - EURO Journal on Computational Optimization
JF - EURO Journal on Computational Optimization
M1 - 100077
ER -