Which unit for acoustic and language modeling for Khmer Automatic Speech Recognition?

Fiche du document

Date

2008

Type de document
Périmètre
Langue
Identifiants
Collection

Archives ouvertes

Licences

http://hal.archives-ouvertes.fr/licences/copyright/ , info:eu-repo/semantics/OpenAccess


Mots-clés En

Speech ASR Khmer


Citer ce document

Sopheap Seng et al., « Which unit for acoustic and language modeling for Khmer Automatic Speech Recognition? », HAL-SHS : sciences de l'information, de la communication et des bibliothèques, ID : 10670/1.ad9gx0


Métriques


Partage / Export

Résumé En

In this paper we present an overview on the development of a large vocabulary continuous speech recognition system for Khmer language. Methods and tools used for quick language resources collection for the development of an ASR system for a new under-resourced language are presented. Face with the problem of lack of text data and the word error segmentation in language modeling, we investigate how different views of the text data (word and sub-word units) can be exploited for Khmer language modeling. We propose to work both at the model level (by making hybrid vocabularies with both word and sub-word units) as well as at the ASR output level (by using a simple N-best list voting mechanism). For acoustic modeling, we use basic linguistic rules to automatically generate pronunciation dictionaries based on grapheme and phoneme. An experimental framework is setup to evaluate the performance of each modeling units. Index Terms-ASR, Khmer, word and sub-word units, acoustic modeling, language modeling.

document thumbnail

Par les mêmes auteurs

Sur les mêmes sujets

Sur les mêmes disciplines

Exporter en