Chunk Different Kind of Spoken Discourse: Challenges for Machine Learning

Iris Eshkol-Taravella et al., « Chunk Different Kind of Spoken Discourse: Challenges for Machine Learning », HAL-SHS : linguistique, ID : 10670/1.409spi

Partage / Export

Résumé En

This paper describes the development of a chunker for spoken data by supervised machine learning using the CRFs, based on a small reference corpus composed of two kinds of discourse: prepared monologue vs. spontaneous talk in interaction. The methodology considers the specific character of the spoken data. The machine learning uses the results of several available taggers, without correcting the results manually. Experiments show that the discourse type (monologue vs. free talk), the speech nature (spontaneous vs. prepared) and the corpus size can influence the results of the machine learning process and must be considered while interpreting the results.

Chunk Different Kind of Spoken Discourse: Challenges for Machine Learning

Fiche du document

Mots-clés En

Sujets proches En

Citer ce document

Métriques

Partage / Export

Résumé En

Par les mêmes auteurs

Sur les mêmes sujets

Sur les mêmes disciplines

Exporter en