TY - JOUR
T1 - Detection of voicing and place of articulation of fricatives with deep learning in a virtual speech and language therapy tutor
AU - Anjos, Ivo
AU - Eskenazi, Maxine
AU - Marques, Nuno
AU - Grilo, Margarida
AU - Guimarães, Isabel
AU - Magalhães, João
AU - Cavaco, Sofia
N1 - info:eu-repo/grantAgreement/FCT/5665-PICT/CMUP-ERI%2FTIC%2F0033%2F2014/PT#
info:eu-repo/grantAgreement/FCT/6817 - DCRRNI ID/UID%2FCEC%2F04516%2F2019/PT#
PY - 2020
Y1 - 2020
N2 - Children with fricative distortion errors have to learn how to correctly use the vocal folds, and which place of articulation to use in order to correctly produce the different fricatives. Here we propose a virtual tutor for fricatives distortion correction. This is a virtual tutor for speech and language therapy that helps children understand their fricative production errors and how to correctly use their speech organs. The virtual tutor uses log Mel filter banks and deep learning techniques with spectral-temporal convolutions of the data to classify the fricatives in children's speech by place of articulation and voicing. It achieves an accuracy of 90.40% for place of articulation and 90.93% for voicing with children's speech. Furthermore, this paper discusses a multidimensional advanced data analysis of the first layer convolutional kernel filters that validates the usefulness of performing the convolution on the log Mel filter bank.
AB - Children with fricative distortion errors have to learn how to correctly use the vocal folds, and which place of articulation to use in order to correctly produce the different fricatives. Here we propose a virtual tutor for fricatives distortion correction. This is a virtual tutor for speech and language therapy that helps children understand their fricative production errors and how to correctly use their speech organs. The virtual tutor uses log Mel filter banks and deep learning techniques with spectral-temporal convolutions of the data to classify the fricatives in children's speech by place of articulation and voicing. It achieves an accuracy of 90.40% for place of articulation and 90.93% for voicing with children's speech. Furthermore, this paper discusses a multidimensional advanced data analysis of the first layer convolutional kernel filters that validates the usefulness of performing the convolution on the log Mel filter bank.
KW - Convolutional neural networks
KW - Fricatives
KW - Speech and language therapy
UR - http://www.scopus.com/inward/record.url?scp=85093105936&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2020-2821
DO - 10.21437/Interspeech.2020-2821
M3 - Conference article
AN - SCOPUS:85093105936
SN - 2308-457X
VL - 2020-October
SP - 3156
EP - 3160
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
T2 - 21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020
Y2 - 25 October 2020 through 29 October 2020
ER -