Chunk Different Kind of Spoken Discourse: Challenges for Machine Learning

Fiche du document

Date

13 mai 2020

Discipline
Type de document
Périmètre
Langue
Identifiants
Collection

Archives ouvertes

Licence

info:eu-repo/semantics/OpenAccess



Sujets proches En

Learning, Machine

Citer ce document

Iris Eshkol-Taravella et al., « Chunk Different Kind of Spoken Discourse: Challenges for Machine Learning », HAL-SHS : linguistique, ID : 10670/1.409spi


Métriques


Partage / Export

Résumé En

This paper describes the development of a chunker for spoken data by supervised machine learning using the CRFs, based on a small reference corpus composed of two kinds of discourse: prepared monologue vs. spontaneous talk in interaction. The methodology considers the specific character of the spoken data. The machine learning uses the results of several available taggers, without correcting the results manually. Experiments show that the discourse type (monologue vs. free talk), the speech nature (spontaneous vs. prepared) and the corpus size can influence the results of the machine learning process and must be considered while interpreting the results.

document thumbnail

Par les mêmes auteurs

Sur les mêmes sujets

Sur les mêmes disciplines

Exporter en