5 juin 2021
HALSHS : archive ouverte en Sciences de l’Homme et de la Société - notices sans texte intégral
info:eu-repo/semantics/OpenAccess
Johanna Cordova et al., « Toward Creation of Ancash Quechua Lexical Resources from OCR », HALSHS : archive ouverte en Sciences de l’Homme et de la Société - notices sans texte intégral, ID : 10670/1.a9613e...
The Quechua linguistic family has a limited number of NLP resources, most of them being dedicated to Southern Quechua, whereas the varieties of Central Quechua have, to the best of our knowledge, no specific resources (software, lexicon or corpus). Our work addresses this issue by producing two resources for the Ancash Quechua: a full digital version of a dictionary, and an OCR model adapted to the considered variety. In this paper, we describe the steps towards this goal: we first measure performances of existing models for the task of digitising a Quechua dictionary, then adapt a model for the Ancash variety, and finally create a reliable resource for NLP in XML-TEI format. We hope that this work will be a basis for initiating NLP projects for Central Quechua, and that it will encourage digitisation initiatives for under-resourced languages.