Entities, Dates, and Languages: Zero-Shot on Historical Texts with T0

Fiche du document

Date

27 mai 2022

Type de document
Périmètre
Langue
Identifiants
Relations

Ce document est lié à :
info:eu-repo/semantics/altIdentifier/arxiv/2204.05211

Collection

Archives ouvertes


Sujets proches En

Foreign languages Languages

Citer ce document

Francesco de Toni et al., « Entities, Dates, and Languages: Zero-Shot on Historical Texts with T0 », HAL-SHS : linguistique, ID : 10670/1.0cme4u


Métriques


Partage / Export

Résumé En

In this work, we explore whether the recently demonstrated zero-shot abilities of the T0 model extend to Named Entity Recognition for out-of-distribution languages and time periods. Using a historical newspaper corpus in 3 languages as test-bed, we use prompts to extract possible named entities. Our results show that a naive approach for prompt-based zero-shot multilingual Named Entity Recognition is error-prone, but highlights the potential of such an approach for historical languages lacking labeled datasets. Moreover, we also find that T0-like models can be probed to predict the publication date and language of a document, which could be very relevant for the study of historical texts.

document thumbnail

Par les mêmes auteurs

Sur les mêmes sujets

Sur les mêmes disciplines

Exporter en