TY - JOUR
T1 - Using Machine Learning Methods to Forecast Air Quality
T2 - A Case Study in Macao
AU - Lei, Thomas M. T.
AU - Siu, Shirley W. I.
AU - Monjardino, Joana
AU - Mendes, Luísa
AU - Ferreira, Francisco
N1 - Funding Information:
This research was funded by Fundação para a Ciência e Tecnologia, I.P., Portugal, grant number UID/AMB/04085/2020, and the APC was funded by CENSE.
Funding Information:
The work developed was supported by The Macao Meteorological and Geophysical Bureau (SMG).
Publisher Copyright:
© 2022 by the authors.
PY - 2022/9/1
Y1 - 2022/9/1
N2 - Despite the levels of air pollution in Macao continuing to improve over recent years, there are still days with high-pollution episodes that cause great health concerns to the local community. Therefore, it is very important to accurately forecast air quality in Macao. Machine learning methods such as random forest (RF), gradient boosting (GB), support vector regression (SVR), and multiple linear regression (MLR) were applied to predict the levels of particulate matter (PM10 and PM2.5) concentrations in Macao. The forecast models were built and trained using the meteorological and air quality data from 2013 to 2018, and the air quality data from 2019 to 2021 were used for validation. Our results show that there is no significant difference between the performance of the four methods in predicting the air quality data for 2019 (before the COVID-19 pandemic) and 2021 (the new normal period). However, RF performed significantly better than the other methods for 2020 (amid the pandemic) with a higher coefficient of determination (R2) and lower RMSE, MAE, and BIAS. The reduced performance of the statistical MLR and other ML models was presumably due to the unprecedented low levels of PM10 and PM2.5 concentrations in 2020. Therefore, this study suggests that RF is the most reliable prediction method for pollutant concentrations, especially in the event of drastic air quality changes due to unexpected circumstances, such as a lockdown caused by a widespread infectious disease.
AB - Despite the levels of air pollution in Macao continuing to improve over recent years, there are still days with high-pollution episodes that cause great health concerns to the local community. Therefore, it is very important to accurately forecast air quality in Macao. Machine learning methods such as random forest (RF), gradient boosting (GB), support vector regression (SVR), and multiple linear regression (MLR) were applied to predict the levels of particulate matter (PM10 and PM2.5) concentrations in Macao. The forecast models were built and trained using the meteorological and air quality data from 2013 to 2018, and the air quality data from 2019 to 2021 were used for validation. Our results show that there is no significant difference between the performance of the four methods in predicting the air quality data for 2019 (before the COVID-19 pandemic) and 2021 (the new normal period). However, RF performed significantly better than the other methods for 2020 (amid the pandemic) with a higher coefficient of determination (R2) and lower RMSE, MAE, and BIAS. The reduced performance of the statistical MLR and other ML models was presumably due to the unprecedented low levels of PM10 and PM2.5 concentrations in 2020. Therefore, this study suggests that RF is the most reliable prediction method for pollutant concentrations, especially in the event of drastic air quality changes due to unexpected circumstances, such as a lockdown caused by a widespread infectious disease.
KW - air pollution
KW - air quality
KW - air quality forecast
KW - COVID-19
KW - gradient boosting
KW - multiple linear regression
KW - random forest
KW - support vector regression
UR - http://www.scopus.com/inward/record.url?scp=85138708178&partnerID=8YFLogxK
U2 - 10.3390/atmos13091412
DO - 10.3390/atmos13091412
M3 - Article
AN - SCOPUS:85138708178
SN - 2073-4433
VL - 13
JO - Atmosphere
JF - Atmosphere
IS - 9
M1 - 1412
ER -