2020
Ce document est lié à :
https://hdl.handle.net/20.500.13089/1cho
Ce document est lié à :
https://doi.org/10.4000/books.aaccademia
Ce document est lié à :
info:eu-repo/semantics/altIdentifier/isbn/979-12-80136-32-9
info:eu-repo/semantics/openAccess , https://creativecommons.org/licenses/by-nc-nd/4.0/
Riccardo Massidda, « rmassidda @ DaDoEval: Document Dating Using Sentence Embeddings at EVALITA 2020 », Accademia University Press
This report describes an approach to solve the DaDoEval document dating subtasks for the EVALITA 2020 competition. The dating problem is tackled as a classification problem, where the significant length of the documents in the provided dataset is addressed by using sentence embeddings in a hierarchical architecture. Three different pre-trained models to generate sentence embeddings have been evaluated and compared: USE, LaBSE and SBERT. Other than sentence embeddings the classifier exploits a bag-of-entities representation of the document, generated using a pre-trained named entity recognizer. The final model is able to simultaneously produce the required date for each subtask.