A Survey on evaluation of summarization methods

Liana Ermakova; Jean-Valère Cossu; Josiane Mothe

A Survey on evaluation of summarization methods

Fiche du document

Auteurs

Type de document

Articles

Périmètre

Publications

Langue

Anglais

Identifiants

Source

HAL-SHS : littérature

Relations

Ce document est lié à :
info:eu-repo/semantics/altIdentifier/doi/10.1016/j.ipm.2019.04.001

Collection

Archives ouvertes

Organisation

Centre pour la communication scientifique directe

Licences

http://creativecommons.org/licenses/by-nc/ , info:eu-repo/semantics/OpenAccess

Mots-clés En

automatic summarization text compression evaluation campaigns assessment metrics extraction extractive summarization ROUGE

Sujets proches En

Assessment

Citer ce document

Liana Ermakova et al., « A Survey on evaluation of summarization methods », HAL-SHS : littérature, ID : 10.1016/j.ipm.2019.04.001

Partage / Export

Résumé En

The increasing volume of textual information on any topic requires its compression to allow humans to digest it. This implies detecting the most important information and condensing it. These challenges have led to new developments in the area of Natural Language Processing (NLP) and Information Retrieval (IR) such as narrative summarization and evaluation methodologies for narrative extraction. Despite some progress over recent years with several solutions for information extraction and text summarization, the problems of generating consistent narrative summaries and evaluating them are still unresolved. With regard to evaluation, manual assessment is expensive, subjective and not applicable in real time or to large collections. Moreover, it does not provide re-usable benchmarks. Nevertheless, commonly used metrics for summary evaluation still imply substantial human effort since they require a comparison of candidate summaries with a set of reference summaries. The contributions of this paper are three-fold. First, we provide a comprehensive overview of existing metrics for summary evaluation. We discuss several limitations of existing frameworks for summary evaluation. Second, we introduce an automatic framework for the evaluation of metrics that does not require any human annotation. Finally, we evaluate the existing assessment metrics on a Wikipedia data set and a collection of scientific articles using this framework. Our findings show that the majority of existing metrics based on vocabulary overlap are not suitable for assessment based on comparison with a full text and we discuss this outcome.

A Survey on evaluation of summarization methods

Fiche du document

Mots-clés En

Sujets proches En

Citer ce document

Métriques

Partage / Export

Résumé En

Par les mêmes auteurs

Sur les mêmes sujets

Exporter en