rmassidda @ DaDoEval: Document Dating Using Sentence Embeddings at EVALITA 2020

Fiche du document

Date

2020

Discipline
Périmètre
Langue
Identifiant
  • 20.500.13089/1djh
Relations

Ce document est lié à :
https://hdl.handle.net/20.500.13089/1cho

Ce document est lié à :
https://doi.org/10.4000/books.aaccademia

Ce document est lié à :
info:eu-repo/semantics/altIdentifier/isbn/979-12-80136-32-9

Collection

OpenEdition Books

Organisation

OpenEdition

Licences

info:eu-repo/semantics/openAccess , https://creativecommons.org/licenses/by-nc-nd/4.0/

Résumé 0

This report describes an approach to solve the DaDoEval document dating subtasks for the EVALITA 2020 competition. The dating problem is tackled as a classification problem, where the significant length of the documents in the provided dataset is addressed by using sentence embeddings in a hierarchical architecture. Three different pre-trained models to generate sentence embeddings have been evaluated and compared: USE, LaBSE and SBERT. Other than sentence embeddings the classifier exploits a bag-of-entities representation of the document, generated using a pre-trained named entity recognizer. The final model is able to simultaneously produce the required date for each subtask.

document thumbnail

Par les mêmes auteurs

Sur les mêmes sujets

Sur les mêmes disciplines