Classification and biomarker selection in lower-grade glioma using robust sparse logistic regression applied to RNA-seq data

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)
35 Downloads (Pure)


Effective diagnosis and treatment in cancer is a barrier for the development of personalized medicine, mostly due to tumor heterogeneity. In the particular case of gliomas, highly heterogeneous brain tumors at the histological, cellular and molecular levels, and exhibiting poor prognosis, the mechanisms behind tumor heterogeneity and progression remain poorly understood. The recent advances in biomedical high-throughput technologies have allowed the generation of large amounts of molecular information from the patients that combined with statistical and machine learning techniques can be used for the definition of glioma subtypes and targeted therapies, an invaluable contribution to disease understanding and effective management. In this work sparse and robust sparse logistic regression models with the elastic net penalty were applied to glioma RNA-seq data from The Cancer Genome Atlas (TCGA), to identify relevant tran-scriptomic features in the separation between lower-grade glioma (LGG) subtypes and identify putative outlying observations. In general, all classification models yielded good accuracies, selecting different sets of genes. Among the genes selected by the models, TXNDC12, TOMM20, PKIA, CARD8 and TAF12 have been reported as genes with relevant role in glioma development and progression. This highlights the suitability of the present approach to disclose relevant genes and fosters the biological validation of non-reported genes.

Original languageEnglish
Pages (from-to)371-381
Number of pages11
Issue number4
Publication statusPublished - 31 Dec 2022


  • Classification
  • Elastic net regularization
  • Glioma
  • Robust Statistics
  • Sparse Logistic regression


Dive into the research topics of 'Classification and biomarker selection in lower-grade glioma using robust sparse logistic regression applied to RNA-seq data'. Together they form a unique fingerprint.

Cite this