Linguistic corpora of understudied languages: do they make sense?

Fiche du document

Date

1 juin 2016

Discipline
Type de document
Périmètre
Langue
Identifiant
Source

Káñina

Relations

Ce document est lié à :
10.15517/rk.v40i1.24143

Organisation

SciELO

Licence

info:eu-repo/semantics/openAccess




Citer ce document

Igor Vinogradov, « Linguistic corpora of understudied languages: do they make sense? », Káñina, ID : 10670/1.xzo92c


Métriques


Partage / Export

Résumé 0

:A corpus of an understudied language usually has documentary-linguistic nature and comprises all text material available in a particular language. However, without resorting to text selection, it is impossible to obtain a representative and balanced sample of language use. Lack of these two characteristics makes a corpus almost useless for any kind of quantitative research. Nevertheless, corpora of understudied languages comply with a wide range of language documentation objectives. Furthermore, they can serve as evidence of the existence of word forms or grammatical features in texts that meet specific search criteria. If such corpora have well-elaborated linguistic annotation, they can complement grammatical descriptions and dictionaries, standing out against common text collections due to their digital format. They are especially suitable for typological research, when one has to deal with a huge amount of data in different and unrelated languages.

document thumbnail

Par les mêmes auteurs

Sur les mêmes sujets

Sur les mêmes disciplines

Exporter en