TY - JOUR
T1 - Improving the quality of predictive models in small data GSDOT
T2 - A new algorithm for generating synthetic data
AU - Douzas, Georgios
AU - Lechleitner, Maria
AU - Bacao, Fernando
N1 - info:eu-repo/grantAgreement/FCT/3599-PPCDT/DSAIPA%2FDS%2F0116%2F2019/PT#
Douzas, G., Lechleitner, M., & Bacao, F. (2022). Improving the quality of predictive models in small data GSDOT: A new algorithm for generating synthetic data. PLoS ONE, 17(4), 1-15. [e0265626]. https://doi.org/10.1371/journal.pone.0265626
PY - 2022/4/7
Y1 - 2022/4/7
N2 - In the age of the data deluge there are still many domains and applications restricted to the use of small datasets. The ability to harness these small datasets to solve problems through the use of supervised learning methods can have a significant impact in many important areas. The insufficient size of training data usually results in unsatisfactory performance of machine learning algorithms. The current research work aims to contribute to mitigate the small data problem through the creation of artificial instances, which are added to the training process. The proposed algorithm, Geometric Small Data Oversampling Technique, uses geometric regions around existing samples to generate new high quality instances. Experimental results show a significant improvement in accuracy when compared with the use of the initial small dataset as well as other popular artificial data generation techniques.
AB - In the age of the data deluge there are still many domains and applications restricted to the use of small datasets. The ability to harness these small datasets to solve problems through the use of supervised learning methods can have a significant impact in many important areas. The insufficient size of training data usually results in unsatisfactory performance of machine learning algorithms. The current research work aims to contribute to mitigate the small data problem through the creation of artificial instances, which are added to the training process. The proposed algorithm, Geometric Small Data Oversampling Technique, uses geometric regions around existing samples to generate new high quality instances. Experimental results show a significant improvement in accuracy when compared with the use of the initial small dataset as well as other popular artificial data generation techniques.
UR - http://www.scopus.com/inward/record.url?scp=85127888139&partnerID=8YFLogxK
UR - https://www.webofscience.com/wos/woscc/full-record/WOS:000795077200026
U2 - 10.1371/journal.pone.0265626
DO - 10.1371/journal.pone.0265626
M3 - Article
SN - 1932-6203
VL - 17
SP - 1
EP - 15
JO - PLoS ONE
JF - PLoS ONE
IS - 4
M1 - e0265626
ER -