Skip to Main content Skip to Navigation
Conference papers

Simulating reading mistakes for child speech Transformer-based phone recognition

Abstract : Current performance of automatic speech recognition (ASR) for children is below that of the latest systems dedicated to adult speech. Child speech is particularly difficult to recognise, and substantial corpora are missing to train acoustic models. Furthermore, in the scope of our reading assistant for 5-8-year-old children learning to read, models need to cope with disfluencies and reading mistakes, which remain considerable challenges even for state-of-the-art ASR systems. In this paper, we adapt an end-to-end Transformer acoustic model to speech from children learning to read. Transfer learning (TL) with a small amount of child speech improves the phone error rate (PER) by 48.7% relative over an adult model and outperforms a TL-adapted DNN-HMM model by 21.0% relative PER. Multi-objective training with a Connectionist Temporal Classification (CTC) function further reduces the PER by 4.8% relative. We propose a method of reading mistakes data augmentation, where we simulate word-level repetitions and substitutions with phonetically or graphically close words. Combining these two types of reading mistakes reaches a 19.9% PER, with a 13.1% relative improvement over the baseline. A detailed analysis shows that both the CTC multi-objective training and the augmentation with synthetic repetitions help the attention mechanisms better detect children's disfluencies.
Complete list of metadata

https://hal.archives-ouvertes.fr/hal-03257870
Contributor : Lucile Gelin <>
Submitted on : Friday, June 11, 2021 - 10:46:32 AM
Last modification on : Saturday, July 3, 2021 - 3:48:48 AM
Long-term archiving on: : Sunday, September 12, 2021 - 7:13:36 PM

File

Paper_Interspeech2021_LucileGe...
Files produced by the author(s)

Identifiers

  • HAL Id : hal-03257870, version 1

Citation

Lucile Gelin, Thomas Pellegrini, Julien Pinquier, Morgane Daniel. Simulating reading mistakes for child speech Transformer-based phone recognition. Annual Conference of the International Speech Communication Association (INTERSPEECH), Aug 2021, Brno, Czech Republic. ⟨hal-03257870⟩

Share

Metrics

Record views

60

Files downloads

74