Impact of textual data augmentation on linguistic pattern extraction to improve the idiomaticity of extractive summaries

Fiche du document

Date

2021

Discipline
Périmètre
Langue
Identifiants
Collection

Archives ouvertes

Licence

info:eu-repo/semantics/OpenAccess



Sujets proches En

Pattern Model

Citer ce document

Abdelghani Laifa et al., « Impact of textual data augmentation on linguistic pattern extraction to improve the idiomaticity of extractive summaries », HAL-SHS : linguistique, ID : 10670/1.ueexgb


Métriques


Partage / Export

Résumé En

The present work aims to develop a text summarisation system for financial texts with a focus on the fluidity of the target language. Linguistic analysis shows that the process of writing summaries should take into account not only terminological and collocational extraction, but also a range of linguistic material referred to here as the "support lexicon", that plays an important role in the cognitive organisation of the field. On this basis, this paper highlights the relevance of pre-training the CamemBERT model on a French financial dataset to extend its domainspecific vocabulary and fine-tuning it on extractive summarisation. We then evaluate the impact of textual data augmentation, improving the performance of our extractive text summarisation model by up to 6%-11%.

document thumbnail

Par les mêmes auteurs

Sur les mêmes sujets

Sur les mêmes disciplines

Exporter en