Unraveling Spontaneous Speech Dimensions for Cross-Corpus ASR System Evaluation for French

Fiche du document

Date

20 mai 2024

Type de document
Périmètre
Langue
Identifiants
Collection

Archives ouvertes

Licence

info:eu-repo/semantics/OpenAccess



Sujets proches En

Talking

Citer ce document

Solène Evain et al., « Unraveling Spontaneous Speech Dimensions for Cross-Corpus ASR System Evaluation for French », HAL-SHS : sciences de l'information, de la communication et des bibliothèques, ID : 10670/1.rokoi9


Métriques


Partage / Export

Résumé En

Many papers on speech processing use the term ’spontaneous speech’ as a catch-all term for situations like speakingwith a friend, being interviewed on radio/TV or giving a lecture. However, Automatic Speech Recognition (ASR)systems performance seems to exhibit variation on this type of speech: the more spontaneous the speech, thehigher the WER (Word Error Rate). Our study focuses on better understanding the elements influencing the levels ofspontaneity in order to evaluate the relation between categories of spontaneity and ASR systems performance andimprove the recognition on those categories. We first analyzed the literature, listed and unraveled those elements,and finally identified four axes: the situation of communication, the level of intimacy between speakers, the channeland the type of communication. Then, we trained ASR systems and measured the impact of instances of face-to-faceinteraction labeled with the previous dimensions (different levels of spontaneity) on WER. We made two axes varyand found that both dimensions have an impact on the WER. The situation of communication seems to have thebiggest impact on spontaneity: ASR systems give better results for situations like an interview than for friends havinga conversation at home.

document thumbnail

Par les mêmes auteurs

Sur les mêmes sujets

Sur les mêmes disciplines

Exporter en