Clustering clinical data in R

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

We are currently witnessing a paradigm shift from evidence-based medicine to precision medicine, which has been made possible by the enormous development of technology. The advances in data mining algorithms will allow us to integrate trans-omics with clinical data, contributing to our understanding of pathological mechanisms and massively impacting on the clinical sciences. Cluster analysis is one of the main data mining techniques and allows for the exploration of data patterns that the human mind cannot capture. This chapter focuses on the cluster analysis of clinical data, using the statistical software, R. We outline the cluster analysis process, underlining some clinical data characteristics. Starting with the data preprocessing step, we then discuss the advantages and disadvantages of the most commonly used clustering algorithms and point to examples of their applications in clinical work. Finally, we briefly discuss how to perform validation of clusters. Throughout the chapter we highlight R packages suitable for each computational step of cluster analysis.

Original languageEnglish
Title of host publicationMethods in Molecular Biology
PublisherHumana Press Inc
Pages309-343
Number of pages35
Volume2051
DOIs
Publication statusPublished - 2020

Publication series

NameMethods in Molecular Biology
Volume2051
ISSN (Print)1064-3745
ISSN (Electronic)1940-6029

Fingerprint

Cluster Analysis
Data Mining
Precision Medicine
Evidence-Based Medicine
Software
Technology

Keywords

  • Clinical data
  • Cluster analysis
  • Cluster optimization
  • Cluster stability
  • Cluster tendency
  • Cluster validation
  • Stratification

Cite this

Pina, A., Macedo, M. P., & Henriques, R. (2020). Clustering clinical data in R. In Methods in Molecular Biology (Vol. 2051, pp. 309-343). (Methods in Molecular Biology; Vol. 2051). Humana Press Inc. https://doi.org/10.1007/978-1-4939-9744-2_14
Pina, Ana ; Macedo, Maria Paula ; Henriques, Roberto. / Clustering clinical data in R. Methods in Molecular Biology. Vol. 2051 Humana Press Inc, 2020. pp. 309-343 (Methods in Molecular Biology).
@inbook{1da2fb1309f04241ac7af44a6b8ae2b7,
title = "Clustering clinical data in R",
abstract = "We are currently witnessing a paradigm shift from evidence-based medicine to precision medicine, which has been made possible by the enormous development of technology. The advances in data mining algorithms will allow us to integrate trans-omics with clinical data, contributing to our understanding of pathological mechanisms and massively impacting on the clinical sciences. Cluster analysis is one of the main data mining techniques and allows for the exploration of data patterns that the human mind cannot capture. This chapter focuses on the cluster analysis of clinical data, using the statistical software, R. We outline the cluster analysis process, underlining some clinical data characteristics. Starting with the data preprocessing step, we then discuss the advantages and disadvantages of the most commonly used clustering algorithms and point to examples of their applications in clinical work. Finally, we briefly discuss how to perform validation of clusters. Throughout the chapter we highlight R packages suitable for each computational step of cluster analysis.",
keywords = "Clinical data, Cluster analysis, Cluster optimization, Cluster stability, Cluster tendency, Cluster validation, Stratification",
author = "Ana Pina and Macedo, {Maria Paula} and Roberto Henriques",
year = "2020",
doi = "10.1007/978-1-4939-9744-2_14",
language = "English",
volume = "2051",
series = "Methods in Molecular Biology",
publisher = "Humana Press Inc",
pages = "309--343",
booktitle = "Methods in Molecular Biology",

}

Pina, A, Macedo, MP & Henriques, R 2020, Clustering clinical data in R. in Methods in Molecular Biology. vol. 2051, Methods in Molecular Biology, vol. 2051, Humana Press Inc, pp. 309-343. https://doi.org/10.1007/978-1-4939-9744-2_14

Clustering clinical data in R. / Pina, Ana; Macedo, Maria Paula; Henriques, Roberto.

Methods in Molecular Biology. Vol. 2051 Humana Press Inc, 2020. p. 309-343 (Methods in Molecular Biology; Vol. 2051).

Research output: Chapter in Book/Report/Conference proceedingChapter

TY - CHAP

T1 - Clustering clinical data in R

AU - Pina, Ana

AU - Macedo, Maria Paula

AU - Henriques, Roberto

PY - 2020

Y1 - 2020

N2 - We are currently witnessing a paradigm shift from evidence-based medicine to precision medicine, which has been made possible by the enormous development of technology. The advances in data mining algorithms will allow us to integrate trans-omics with clinical data, contributing to our understanding of pathological mechanisms and massively impacting on the clinical sciences. Cluster analysis is one of the main data mining techniques and allows for the exploration of data patterns that the human mind cannot capture. This chapter focuses on the cluster analysis of clinical data, using the statistical software, R. We outline the cluster analysis process, underlining some clinical data characteristics. Starting with the data preprocessing step, we then discuss the advantages and disadvantages of the most commonly used clustering algorithms and point to examples of their applications in clinical work. Finally, we briefly discuss how to perform validation of clusters. Throughout the chapter we highlight R packages suitable for each computational step of cluster analysis.

AB - We are currently witnessing a paradigm shift from evidence-based medicine to precision medicine, which has been made possible by the enormous development of technology. The advances in data mining algorithms will allow us to integrate trans-omics with clinical data, contributing to our understanding of pathological mechanisms and massively impacting on the clinical sciences. Cluster analysis is one of the main data mining techniques and allows for the exploration of data patterns that the human mind cannot capture. This chapter focuses on the cluster analysis of clinical data, using the statistical software, R. We outline the cluster analysis process, underlining some clinical data characteristics. Starting with the data preprocessing step, we then discuss the advantages and disadvantages of the most commonly used clustering algorithms and point to examples of their applications in clinical work. Finally, we briefly discuss how to perform validation of clusters. Throughout the chapter we highlight R packages suitable for each computational step of cluster analysis.

KW - Clinical data

KW - Cluster analysis

KW - Cluster optimization

KW - Cluster stability

KW - Cluster tendency

KW - Cluster validation

KW - Stratification

UR - http://www.scopus.com/inward/record.url?scp=85072605518&partnerID=8YFLogxK

U2 - 10.1007/978-1-4939-9744-2_14

DO - 10.1007/978-1-4939-9744-2_14

M3 - Chapter

VL - 2051

T3 - Methods in Molecular Biology

SP - 309

EP - 343

BT - Methods in Molecular Biology

PB - Humana Press Inc

ER -

Pina A, Macedo MP, Henriques R. Clustering clinical data in R. In Methods in Molecular Biology. Vol. 2051. Humana Press Inc. 2020. p. 309-343. (Methods in Molecular Biology). https://doi.org/10.1007/978-1-4939-9744-2_14