Toward Creation of Ancash Quechua Lexical Resources from OCR

Fiche du document

Date

5 juin 2021

Discipline
Type de document
Périmètre
Langue
Identifiants
Collection

Archives ouvertes

Licence

info:eu-repo/semantics/OpenAccess



Citer ce document

Johanna Cordova et al., « Toward Creation of Ancash Quechua Lexical Resources from OCR », HALSHS : archive ouverte en Sciences de l’Homme et de la Société - notices sans texte intégral, ID : 10670/1.a9613e...


Métriques


Partage / Export

Résumé En

The Quechua linguistic family has a limited number of NLP resources, most of them being dedicated to Southern Quechua, whereas the varieties of Central Quechua have, to the best of our knowledge, no specific resources (software, lexicon or corpus). Our work addresses this issue by producing two resources for the Ancash Quechua: a full digital version of a dictionary, and an OCR model adapted to the considered variety. In this paper, we describe the steps towards this goal: we first measure performances of existing models for the task of digitising a Quechua dictionary, then adapt a model for the Ancash variety, and finally create a reliable resource for NLP in XML-TEI format. We hope that this work will be a basis for initiating NLP projects for Central Quechua, and that it will encourage digitisation initiatives for under-resourced languages.

document thumbnail

Par les mêmes auteurs

Sur les mêmes sujets

Sur les mêmes disciplines