Vocal effort in situation

Fiche du document

Date

31 août 2011

Type de document
Périmètre
Langue
Identifiants
Collection

Archives ouvertes


Mots-clés Und

-ph]

Sujets proches En

Speaking

Citer ce document

Jean-Sylvain Liénard, « Vocal effort in situation », HAL-SHS : sciences de l'information, de la communication et des bibliothèques, ID : 10670/1.x8uhrr


Métriques


Partage / Export

Résumé En

Oral communication is an interactive and situated activity. Vocal effort is strongly related to both interlocutors' situation. In "natural" situations, when no microphone or loudspeaker is used, the interlocutors are immersed in the same environment. They are aware of their mutual distance and location as well as of the ambient noise and reverberation. They can react either orally or visually toinform the other person about the success of the oral transmission. They may be able to move closer to each other. Some implicit rules apply: the talker routinely raises his or her voice when the interlocutor stands at a distance exceeding a few meters, or when the noise level is significant, or when the interlocutor is known to have hearing problems. Conversely, the talker lowers his/her voice when the interlocutor is close by, or in a silent environment, or in a reverberant room. In order to reach a distant listener, the talker has to articulate more distinctly or to speak louder, or both. Speaking more clearly means a cognitive effort, speaking louder provokes signal distortion, fatigue, loss of privacy. The talker's actual vocal effort is a tradeoff, specific to each situation, between the vocal effort to produce and the communication efficiency. In a given social group this tradeoff is known by everybody, as demonstrated by the fact that, when confronted with a new situation, one instantly adopts the appropriate level of voice to address the targeted listener. One may hypothesize the existence of an half conscious norm for the vocal effort, which defines, in most typical communication situations, the notion of a "normal" voice in terms of muscular commands, auditory voice control, and acoustic features to be received by the interlocutor. The purpose of oral communication may not be limited to the transmission of linguistic information. As with attitude, gesture or clothing, slight deviations from the norm may be used to transmit non-linguistic information, for instance to assert a dominant position, to convey acceptance or refusal, or to express anger or satisfaction. While the dynamic range of the human voice extends up to some 60 dB, its timbre changes widely from the whispered voice barely audible at a 10 cm distance, to the shout, which can be heard over a distance of several hundred meters. Even after level equalization, the timbre variations retain enough information to distinguish at least 3 voice categories: whispered, conversational, and shouted. Within each of those categories, it is not clear whether level perception is categorical or continuous, although the second category may easily be divided into soft, medium and loud. The acoustic cues associated to the vocal effort have been widely investigated. They include the sound level, the frequencies of F0 and F1, the average level of the middle and high parts of the spectrum, the level discrepancies between voiced and noisy parts of speech, the vowel lengthening, the fragmentation of the discourse into shorter utterances, and the tendency of hyperarticulation, all of which contribute to ensuring the correct reception of the oral information by the interlocutor. However, the way in which these cues are combined to yield the perception of a given degree of vocal effort has not been established yet. In the last century, systems allowing recording, amplifying, modifying and preserving the sound have invaded our lives. In some cases they allow to breaking the contextual link and interaction capability between speaker and listener. The implicit norm of voice strength evoked above hasevolved. New communication situations have appeared, which do not destroy the traditional ones but add up to them. Current speech synthesis and recognition systems, conceived as purely passive devices, do not take into account the contextual and interactive aspects of oral communication. This results in severe limitations for the user. This approach may change in the future, under the pressure of users requiring more efficient speech/voice human-machine communication tools.

document thumbnail

Par les mêmes auteurs

Sur les mêmes sujets

Exporter en