TY - JOUR
T1 - Classification and biomarker selection in lower-grade glioma using robust sparse logistic regression applied to RNA-seq data
AU - Carrilho, João F.
AU - Lopes, Marta B.
N1 - info:eu-repo/grantAgreement/FCT/6817 - DCRRNI ID/UIDB%2F00297%2F2020/PT#
info:eu-repo/grantAgreement/FCT/6817 - DCRRNI ID/UIDB%2F00297%2F2020/PT#
info:eu-repo/grantAgreement/FCT/6817 - DCRRNI ID/UIDB%2F04516%2F2020/PT#
info:eu-repo/grantAgreement/FCT/Concurso para Financiamento de Projetos de Investigação Científica e Desenvolvimento Tecnológico em Todos os Domínios Científicos - 2020/PTDC%2FCCI-BIO%2F4180%2F2020/PT#
Funding Information:
This work was supported by national funds through Fundação para a Ciência e a Tecnologia (FCT) with references CEECINST/00102/2018 (NOVA MATH, Center for Mathematics and Applications).The results presented are based upon data generated by the TCGA Research Network: https://www.cancer. gov/tcga.
Publisher Copyright:
© Brazilian Journal of Biometrics.
PY - 2022/12/31
Y1 - 2022/12/31
N2 - Effective diagnosis and treatment in cancer is a barrier for the development of personalized medicine, mostly due to tumor heterogeneity. In the particular case of gliomas, highly heterogeneous brain tumors at the histological, cellular and molecular levels, and exhibiting poor prognosis, the mechanisms behind tumor heterogeneity and progression remain poorly understood. The recent advances in biomedical high-throughput technologies have allowed the generation of large amounts of molecular information from the patients that combined with statistical and machine learning techniques can be used for the definition of glioma subtypes and targeted therapies, an invaluable contribution to disease understanding and effective management. In this work sparse and robust sparse logistic regression models with the elastic net penalty were applied to glioma RNA-seq data from The Cancer Genome Atlas (TCGA), to identify relevant tran-scriptomic features in the separation between lower-grade glioma (LGG) subtypes and identify putative outlying observations. In general, all classification models yielded good accuracies, selecting different sets of genes. Among the genes selected by the models, TXNDC12, TOMM20, PKIA, CARD8 and TAF12 have been reported as genes with relevant role in glioma development and progression. This highlights the suitability of the present approach to disclose relevant genes and fosters the biological validation of non-reported genes.
AB - Effective diagnosis and treatment in cancer is a barrier for the development of personalized medicine, mostly due to tumor heterogeneity. In the particular case of gliomas, highly heterogeneous brain tumors at the histological, cellular and molecular levels, and exhibiting poor prognosis, the mechanisms behind tumor heterogeneity and progression remain poorly understood. The recent advances in biomedical high-throughput technologies have allowed the generation of large amounts of molecular information from the patients that combined with statistical and machine learning techniques can be used for the definition of glioma subtypes and targeted therapies, an invaluable contribution to disease understanding and effective management. In this work sparse and robust sparse logistic regression models with the elastic net penalty were applied to glioma RNA-seq data from The Cancer Genome Atlas (TCGA), to identify relevant tran-scriptomic features in the separation between lower-grade glioma (LGG) subtypes and identify putative outlying observations. In general, all classification models yielded good accuracies, selecting different sets of genes. Among the genes selected by the models, TXNDC12, TOMM20, PKIA, CARD8 and TAF12 have been reported as genes with relevant role in glioma development and progression. This highlights the suitability of the present approach to disclose relevant genes and fosters the biological validation of non-reported genes.
KW - Classification
KW - Elastic net regularization
KW - Glioma
KW - Robust Statistics
KW - Sparse Logistic regression
UR - http://www.scopus.com/inward/record.url?scp=85147182692&partnerID=8YFLogxK
U2 - 10.28951/bjb.v40i4.634
DO - 10.28951/bjb.v40i4.634
M3 - Article
AN - SCOPUS:85147182692
SN - 1983-0823
VL - 40
SP - 371
EP - 381
JO - REVISTA BRASILEIRA DE BIOMETRIA
JF - REVISTA BRASILEIRA DE BIOMETRIA
IS - 4
ER -