A novel gluten knowledge base of potential biomedical and health-related interactions extracted from the literature: Using machine learning and graph analysis methodologies to reconstruct the bibliome

Martín Pérez-Pérez, Tânia Ferreira, Gilberto Igrejas, Florentino Fdez-Riverola

Research output: Contribution to journalArticlepeer-review

3 Citations (Scopus)
6 Downloads (Pure)

Abstract

Background: In return for their nutritional properties and broad availability, cereal crops have been associated with different alimentary disorders and symptoms, with the majority of the responsibility being attributed to gluten. Therefore, the research of gluten-related literature data continues to be produced at ever-growing rates, driven in part by the recent exploratory studies that link gluten to non-traditional diseases and the popularity of gluten-free diets, making it increasingly difficult to access and analyse practical and structured information. In this sense, the accelerated discovery of novel advances in diagnosis and treatment, as well as exploratory studies, produce a favourable scenario for disinformation and misinformation. Objectives: Aligned with, the European Union strategy “Delivering on EU Food Safety and Nutrition in 2050″ which emphasizes the inextricable links between imbalanced diets, the increased exposure to unreliable sources of information and misleading information, and the increased dependency on reliable sources of information; this paper presents GlutKNOIS, a public and interactive literature-based database that reconstructs and represents the experimental biomedical knowledge extracted from the gluten-related literature. The developed platform includes different external database knowledge, bibliometrics statistics and social media discussion to propose a novel and enhanced way to search, visualise and analyse potential biomedical and health-related interactions in relation to the gluten domain. Methods: For this purpose, the presented study applies a semi-supervised curation workflow that combines natural language processing techniques, machine learning algorithms, ontology-based normalization and integration approaches, named entity recognition methods, and graph knowledge reconstruction methodologies to process, classify, represent and analyse the experimental findings contained in the literature, which is also complemented by data from the social discussion. Results and conclusions: In this sense, 5814 documents were manually annotated and 7424 were fully automatically processed to reconstruct the first online gluten-related knowledge database of evidenced health-related interactions that produce health or metabolic changes based on the literature. In addition, the automatic processing of the literature combined with the knowledge representation methodologies proposed has the potential to assist in the revision and analysis of years of gluten research. The reconstructed knowledge base is public and accessible at https://sing-group.org/glutknois/
Original languageEnglish
Article number104398
Number of pages16
JournalJournal Of Biomedical Informatics
Volume143
DOIs
Publication statusPublished - Jul 2023

Keywords

  • Gluten database
  • Knowledge representation
  • Literature curation
  • Ontology-base methods
  • Relation extraction
  • Social media

Fingerprint

Dive into the research topics of 'A novel gluten knowledge base of potential biomedical and health-related interactions extracted from the literature: Using machine learning and graph analysis methodologies to reconstruct the bibliome'. Together they form a unique fingerprint.

Cite this