The biovisualspeech European Portuguese sibilants corpus

Margarida Grilo, Isabel Guimarães, Mariana Ascensão, Alberto Abad, Ivo Anjos, João Magalhães, Sofia Cavaco

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Citations (Scopus)


The development of reliable speech therapy computer tools that automatically classify speech productions depends on the quality of the speech data set used to train the classification algorithms. The data set should characterize the population in terms of age, gender and native language, but it should also have other important properties that characterize the population that is going to use the tool. Thus, apart from including samples from correct speech productions, it should also have samples from people with speech disorders. Also, the annotation of the data should include information on whether the phonemes are correctly or wrongly pronounced. Here, we present a corpus of European Portuguese children’s speech data that we are using in the development of speech classifiers for speech therapy tools for Portuguese children. The corpus includes data from children with speech disorders and in which the labelling includes information about the speech production errors. This corpus, which has data from 356 children from 5 to 9 years of age, focuses on the European Portuguese sibilant consonants and can be used to train speech recognition models for tools to assist the detection and therapy of sigmatism.

Original languageEnglish
Title of host publicationComputational Processing of the Portuguese Language - 14th International Conference, PROPOR 2020, Proceedings
EditorsPaulo Quaresma, Renata Vieira, Teresa Gonçalves, Sandra Aluísio, Helena Moniz, Fernando Batista
Place of PublicationCham
Number of pages11
ISBN (Electronic)978-3-030-41505-1
ISBN (Print)978-3-030-41504-4
Publication statusPublished - 2020
Event14th International Conference on Computational Processing of the Portuguese Language, PROPOR 2020 - Evora, Portugal
Duration: 2 Mar 20204 Mar 2020

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12037 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Conference14th International Conference on Computational Processing of the Portuguese Language, PROPOR 2020


  • European Portuguese corpus
  • Sibilants
  • Speech sound disorders


Dive into the research topics of 'The biovisualspeech European Portuguese sibilants corpus'. Together they form a unique fingerprint.

Cite this