Overview of data preprocessing for machine learning applications in human microbiome research

Eliana Ibrahimi, Marta B. Lopes, Xhilda Dhamo, Andrea Simeon, Rajesh Shigdel, Karel Hron, Blaž Stres, Domenica D’Elia, Magali Berland, Laura Judith Marcos-Zambrano

Research output: Contribution to journalReview articlepeer-review

3 Citations (Scopus)
14 Downloads (Pure)

Abstract

Although metagenomic sequencing is now the preferred technique to study microbiome-host interactions, analyzing and interpreting microbiome sequencing data presents challenges primarily attributed to the statistical specificities of the data (e.g., sparse, over-dispersed, compositional, inter-variable dependency). This mini review explores preprocessing and transformation methods applied in recent human microbiome studies to address microbiome data analysis challenges. Our results indicate a limited adoption of transformation methods targeting the statistical characteristics of microbiome sequencing data. Instead, there is a prevalent usage of relative and normalization-based transformations that do not specifically account for the specific attributes of microbiome data. The information on preprocessing and transformations applied to the data before analysis was incomplete or missing in many publications, leading to reproducibility concerns, comparability issues, and questionable results. We hope this mini review will provide researchers and newcomers to the field of human microbiome research with an up-to-date point of reference for various data transformation tools and assist them in choosing the most suitable transformation method based on their research questions, objectives, and data characteristics.
Original languageEnglish
Article number1250909
Number of pages8
JournalFrontiers in Microbiology
Volume14
DOIs
Publication statusPublished - Oct 2023

Keywords

  • compositionality
  • data preprocessing
  • human microbiome
  • machine learning
  • metagenomics data
  • normalization

Fingerprint

Dive into the research topics of 'Overview of data preprocessing for machine learning applications in human microbiome research'. Together they form a unique fingerprint.

Cite this