Comparison of chironomic stylization versus statistical modeling of prosody for expressive speech synthesis

Fiche du document

Date

2015

Discipline
Type de document
Périmètre
Langue
Identifiants
Collection

Archives ouvertes



Sujets proches En

Meter Prosody Metrics Talking

Citer ce document

Marc Evrard et al., « Comparison of chironomic stylization versus statistical modeling of prosody for expressive speech synthesis », HAL-SHS : linguistique, ID : 10670/1.5725zc


Métriques


Partage / Export

Résumé En

Chironomic stylization is the process of real-time modification of intonation contours (f0 and tempo) using drawing/writing gestures with a stylus on a graphic tablet. The question addressed in this research is whether hand-made intonation stylization could improve or degrade expressivity and overall quality, compared to statistical modeling of prosody. A system for expressive TTS in French based on HMM was designed. A neutral corpus and six expressive speech corpora were used (anger, fear, joy, sadness, sensuality, surprise). Five sentences were synthesized with the six types of expressivity through CMLLR adaptation. Using a chironomic system, three trained subjects were asked to modify synthetic sentences, aiming at improving their expressive quality. Natural, HMM-TTS, and HMM-TTS-Chironomic sentences were evaluated in an expressivity recognition test and a MOS test. The results show that chironomic modification brings significant improvements in both recognition and MOS tests. These results are discussed in detail, together with the effects of voice quality on the perception of HMM-TTS expressive speech. The two main conclusions are: (i) intonation of HMM-TTS can be significantly improved; (ii) hand-corrected TTS improves expressivity and overall quality. Chironomic stylization is a powerful tool lying between fully automatic TTS and recorded speech.

document thumbnail

Par les mêmes auteurs

Sur les mêmes sujets

Sur les mêmes disciplines

Exporter en