A First LVCSR System for Luxembourgish, a Low-Resourced European Language

Fiche du document

Date

2014

Discipline
Périmètre
Langue
Identifiants
Relations

Ce document est lié à :
info:eu-repo/semantics/altIdentifier/doi/10.1007/978-3-319-08958-4_39

Collection

Archives ouvertes




Citer ce document

Martine Adda-Decker et al., « A First LVCSR System for Luxembourgish, a Low-Resourced European Language », HAL-SHS : linguistique, ID : 10.1007/978-3-319-08958-4_39


Métriques


Partage / Export

Résumé En

Luxembourgish is embedded in a multilingual context on the divide between Romance and Germanic cultures and remains one of Europe’s low-resourced languages. We describe our efforts in building a large vocabulary ASR system for such a “minority” language without resorting to any prior transcribed audio training data. Instead, acoustic models are derived from major European languages. Furthermore, most Luxembourgish written sources include significant parts in other languages. This poses specific challenges to Language Model estimation. Some scientific and technological issues addressed include: (i) how to build acoustic models if no labeled acoustic training data are available for the under-resourced target language? (ii) how to make use of the new system to accelerate resource production for the target language? (iii) how to build a vocabulary and a language model with multilingual written texts? (iv) how to determine the “best” phonemic inventory for ASR? First ASR results illustrate the accuracy of the various sets of monolingual and multilingual acoustic models and what these suggest concerning language typology issues.

document thumbnail

Par les mêmes auteurs

Sur les mêmes sujets

Sur les mêmes disciplines

Exporter en