TY - GEN
T1 - Soil classification based on physical and chemical properties using random forests
AU - Dias, Didier
AU - Martins, Bruno
AU - Pires, João
AU - De Sousa, Luís Moreira
AU - Estima, Jacinto
AU - Damásio, Carlos V.
N1 - info:eu-repo/grantAgreement/FCT/6817 - DCRRNI ID/UID%2FCEC%2F50021%2F2019/PT#
info:eu-repo/grantAgreement/FCT/6817 - DCRRNI ID/UID%2FCEC%2F04516%2F2019/PT#
This research was supported through Fundacao para a Ciencia e Tecnologia (FCT), through the project grant with reference PTDC/CCICIF/32607/2017 (MIMU).
PY - 2019/8/30
Y1 - 2019/8/30
N2 - Soil classification is a method of encoding the most relevant information about a given soil, namely its composition and characteristics, in a single class, to be used in areas like agriculture and forestry. In this paper, we evaluate how confidently we can predict soil classes, following the World Reference Base classification system, based on the physical and chemical characteristics of its layers. The Random Forests classifier was used with data consisting of 6 760 soil profiles composed by 19 464 horizons, collected in Mexico. Four methods of modelling the data were tested (i.e., standard depths, n first layers, thickness, and area weighted thickness). We also fine-tuned the best parameters for the classifier and for a k-NN imputation algorithm, used for addressing problems of missing data. Under-represented classes showed significantly worse results, by being repeatedly predicted as one of the majority classes. The best method to model the data was found to be the n first layers approach, with missing values being imputed with k-NN ($$k=1$$ ). The results present a Kappa value from 0.36 to 0.48 and were in line with the state of the art methods, which mostly use remote sensing data.
AB - Soil classification is a method of encoding the most relevant information about a given soil, namely its composition and characteristics, in a single class, to be used in areas like agriculture and forestry. In this paper, we evaluate how confidently we can predict soil classes, following the World Reference Base classification system, based on the physical and chemical characteristics of its layers. The Random Forests classifier was used with data consisting of 6 760 soil profiles composed by 19 464 horizons, collected in Mexico. Four methods of modelling the data were tested (i.e., standard depths, n first layers, thickness, and area weighted thickness). We also fine-tuned the best parameters for the classifier and for a k-NN imputation algorithm, used for addressing problems of missing data. Under-represented classes showed significantly worse results, by being repeatedly predicted as one of the majority classes. The best method to model the data was found to be the n first layers approach, with missing values being imputed with k-NN ($$k=1$$ ). The results present a Kappa value from 0.36 to 0.48 and were in line with the state of the art methods, which mostly use remote sensing data.
KW - Ensemble learning
KW - Machine learning
KW - Random Forests
KW - Soil classification
KW - Soil properties
UR - http://www.scopus.com/inward/record.url?scp=85072882297&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-30241-2_19
DO - 10.1007/978-3-030-30241-2_19
M3 - Conference contribution
AN - SCOPUS:85072882297
SN - 978-3-030-30240-5
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 212
EP - 223
BT - Progress in Artificial Intelligence - 19th EPIA Conference on Artificial Intelligence, EPIA 2019, Proceedings
A2 - Moura Oliveira, Paulo
A2 - Novais, Paulo
A2 - Reis, Luís Paulo
PB - Springer
CY - Cham
T2 - 19th EPIA Conference on Artificial Intelligence, EPIA 2019
Y2 - 3 September 2019 through 6 September 2019
ER -