TY - JOUR
T1 - A machine learning approximation of the 2015 Portuguese high school student grades
T2 - A hybrid approach
AU - Costa-Mendes, Ricardo
AU - Oliveira, Tiago
AU - Castelli, Mauro
AU - Cruz-Jesus, Frederico
N1 - info:eu-repo/grantAgreement/FCT/3599-PPCDT/DSAIPA%2FDS%2F0032%2F2018/PT#
Costa-Mendes, R., Oliveira, T., Castelli, M., & Cruz-Jesus, F. (2021). A machine learning approximation of the 2015 Portuguese high school student grades: A hybrid approach. Education and Information Technologies, 26(2), 1527-1547. https://doi.org/10.1007/s10639-020-10316-y
PY - 2021/3/1
Y1 - 2021/3/1
N2 - This article uses an anonymous 2014–15 school year dataset from the Directorate-General for Statistics of Education and Science (DGEEC) of the Portuguese Ministry of Education as a means to carry out a predictive power comparison between the classic multilinear regression model and a chosen set of machine learning algorithms. A multilinear regression model is used in parallel with random forest, support vector machine, artificial neural network and extreme gradient boosting machine stacking ensemble implementations. Designing a hybrid analysis is intended where classical statistical analysis and artificial intelligence algorithms are blended to augment the ability to retain valuable conclusions and well-supported results. The machine learning algorithms attain a higher level of predictive ability. In addition, the stacking appropriateness increases as the base learner output correlation matrix determinant increases and the random forest feature importance empirical distributions are correlated with the structure of p-values and the statistical significance test ascertains of the multiple linear model. An information system that supports the nationwide education system should be designed and further structured to collect meaningful and precise data about the full range of academic achievement antecedents. The article concludes that no evidence is found in favour of smaller classes.
AB - This article uses an anonymous 2014–15 school year dataset from the Directorate-General for Statistics of Education and Science (DGEEC) of the Portuguese Ministry of Education as a means to carry out a predictive power comparison between the classic multilinear regression model and a chosen set of machine learning algorithms. A multilinear regression model is used in parallel with random forest, support vector machine, artificial neural network and extreme gradient boosting machine stacking ensemble implementations. Designing a hybrid analysis is intended where classical statistical analysis and artificial intelligence algorithms are blended to augment the ability to retain valuable conclusions and well-supported results. The machine learning algorithms attain a higher level of predictive ability. In addition, the stacking appropriateness increases as the base learner output correlation matrix determinant increases and the random forest feature importance empirical distributions are correlated with the structure of p-values and the statistical significance test ascertains of the multiple linear model. An information system that supports the nationwide education system should be designed and further structured to collect meaningful and precise data about the full range of academic achievement antecedents. The article concludes that no evidence is found in favour of smaller classes.
KW - Academic achievement
KW - High school grades
KW - Machine learning
KW - Random forest
KW - Stacking
KW - Support vector regression
UR - http://www.scopus.com/inward/record.url?scp=85090307250&partnerID=8YFLogxK
UR - http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcAuth=Alerting&SrcApp=Alerting&DestApp=WOS_CPL&DestLinkType=FullRecord&UT=WOS:000566325100001
U2 - 10.1007/s10639-020-10316-y
DO - 10.1007/s10639-020-10316-y
M3 - Article
AN - SCOPUS:85090307250
SN - 1360-2357
VL - 26
SP - 1527
EP - 1547
JO - Education and Information Technologies
JF - Education and Information Technologies
IS - 2
ER -