TY - JOUR
T1 - Addressing the Curse of Missing Data in Clinical Contexts
T2 - A Novel Approach to Correlation-based Imputation
AU - Curioso, Isabel
AU - Santos, Ricardo
AU - Ribeiro, Bruno
AU - Carreiro, André
AU - Coelho, Pedro
AU - Fragata, José
AU - Gamboa, Hugo
N1 - info:eu-repo/grantAgreement/FCT/3599-PPCDT/DSAIPA%2FAI%2F0094%2F2020/PT#
Funding Information:
This work was done under the project “CardioFollow.AI: An intelligent system to improve patients’ safety and remote surveillance in follow-up for cardiothoracic surgery”.
Publisher Copyright:
© 2023 The Author(s)
PY - 2023/6
Y1 - 2023/6
N2 - Clinical data are essential in the medical domain. However, their heterogeneous nature leads to many data quality problems, notably missing values, which undermine the performance of Machine Learning-based clinical systems. Hence, there has been a growing interest in strategies that address this challenge in order to build trustworthy systems to improve the quality of care and benefit clinical decision-making. In particular, missing value imputation is a common approach. This paper proposes three novel imputation techniques that leverage correlation in an innovative manner by exploring the relationship between values and missingness patterns. Experiments were carried out on three publicly available datasets, under three missingness mechanisms with different missing rates, and on two real-world medical datasets. The imputation precision and the classification performance of the proposed techniques were evaluated in a comprehensive comparative study, which included diverse existing methods. The developed techniques outperformed state-of-the-art methods on several assessments while overcoming current flaws shared by correlation-based imputation strategies in real-world medical problems.
AB - Clinical data are essential in the medical domain. However, their heterogeneous nature leads to many data quality problems, notably missing values, which undermine the performance of Machine Learning-based clinical systems. Hence, there has been a growing interest in strategies that address this challenge in order to build trustworthy systems to improve the quality of care and benefit clinical decision-making. In particular, missing value imputation is a common approach. This paper proposes three novel imputation techniques that leverage correlation in an innovative manner by exploring the relationship between values and missingness patterns. Experiments were carried out on three publicly available datasets, under three missingness mechanisms with different missing rates, and on two real-world medical datasets. The imputation precision and the classification performance of the proposed techniques were evaluated in a comprehensive comparative study, which included diverse existing methods. The developed techniques outperformed state-of-the-art methods on several assessments while overcoming current flaws shared by correlation-based imputation strategies in real-world medical problems.
KW - Clinical data
KW - Correlation
KW - Machine learning
KW - Missing data
KW - Missing data imputation
UR - http://www.scopus.com/inward/record.url?scp=85158082011&partnerID=8YFLogxK
U2 - 10.1016/j.jksuci.2023.101562
DO - 10.1016/j.jksuci.2023.101562
M3 - Article
AN - SCOPUS:85158082011
SN - 1319-1578
VL - 35
JO - Journal of King Saud University - Computer and Information Sciences
JF - Journal of King Saud University - Computer and Information Sciences
IS - 6
M1 - 101562
ER -