A machine learning approximation of the 2015 Portuguese high school student grades: A hybrid approach

Research output: Contribution to journalArticle

2 Downloads (Pure)

Abstract

This article uses an anonymous 2014–15 school year dataset from the Directorate-General for Statistics of Education and Science (DGEEC) of the Portuguese Ministry of Education as a means to carry out a predictive power comparison between the classic multilinear regression model and a chosen set of machine learning algorithms. A multilinear regression model is used in parallel with random forest, support vector machine, artificial neural network and extreme gradient boosting machine stacking ensemble implementations. Designing a hybrid analysis is intended where classical statistical analysis and artificial intelligence algorithms are blended to augment the ability to retain valuable conclusions and well-supported results. The machine learning algorithms attain a higher level of predictive ability. In addition, the stacking appropriateness increases as the base learner output correlation matrix determinant increases and the random forest feature importance empirical distributions are correlated with the structure of p-values and the statistical significance test ascertains of the multiple linear model. An information system that supports the nationwide education system should be designed and further structured to collect meaningful and precise data about the full range of academic achievement antecedents. The article concludes that no evidence is found in favour of smaller classes.

Original languageEnglish
JournalEducation and Information Technologies
DOIs
Publication statusE-pub ahead of print - 5 Sep 2020

Keywords

  • Academic achievement
  • High school grades
  • Machine learning
  • Random forest
  • Stacking
  • Support vector regression

Fingerprint Dive into the research topics of 'A machine learning approximation of the 2015 Portuguese high school student grades: A hybrid approach'. Together they form a unique fingerprint.

Cite this