Combining Natural and Artificial Examples to Improve Implicit Discourse Relation Identification

Fiche du document

Date

23 août 2014

Discipline
Type de document
Périmètre
Langue
Identifiants
Collection

Archives ouvertes

Licence

info:eu-repo/semantics/OpenAccess



Citer ce document

Chloé Braud et al., « Combining Natural and Artificial Examples to Improve Implicit Discourse Relation Identification », HAL-SHS : linguistique, ID : 10670/1.iyzg6j


Métriques


Partage / Export

Résumé En

This paper presents the first experiments on identifying implicit discourse relations (i.e., relations lacking an overt discourse connective) in French. Given the little amount of annotated data for this task, our system resorts to additional data automatically labeled using unambiguous connectives, a method introduced by (Marcu and Echihabi, 2002). We first show that a system trained solely on these artificial data does not generalize well to natural implicit examples, thus echoing the conclusion made by (Sporleder and Lascarides, 2008) for English. We then explain these initial results by analyzing the different types of distribution difference between natural and artificial implicit data. This finally leads us to propose a number of very simple methods, all inspired from work on domain adaptation, for combining the two types of data. Through various experiments on the French ANNODIS corpus, we show that our best system achieves an accuracy of 41.7%, corresponding to a 4.4% significant gain over a system solely trained on manually labeled data.

document thumbnail

Par les mêmes auteurs

Sur les mêmes sujets

Sur les mêmes disciplines

Exporter en