20 mai 2024
info:eu-repo/semantics/OpenAccess
Solène Evain et al., « Unraveling Spontaneous Speech Dimensions for Cross-Corpus ASR System Evaluation for French », HAL-SHS : sciences de l'information, de la communication et des bibliothèques, ID : 10670/1.rokoi9
Many papers on speech processing use the term ’spontaneous speech’ as a catch-all term for situations like speakingwith a friend, being interviewed on radio/TV or giving a lecture. However, Automatic Speech Recognition (ASR)systems performance seems to exhibit variation on this type of speech: the more spontaneous the speech, thehigher the WER (Word Error Rate). Our study focuses on better understanding the elements influencing the levels ofspontaneity in order to evaluate the relation between categories of spontaneity and ASR systems performance andimprove the recognition on those categories. We first analyzed the literature, listed and unraveled those elements,and finally identified four axes: the situation of communication, the level of intimacy between speakers, the channeland the type of communication. Then, we trained ASR systems and measured the impact of instances of face-to-faceinteraction labeled with the previous dimensions (different levels of spontaneity) on WER. We made two axes varyand found that both dimensions have an impact on the WER. The situation of communication seems to have thebiggest impact on spontaneity: ASR systems give better results for situations like an interview than for friends havinga conversation at home.