Preserving Endangered European Cultural Heritage and Languages Through Translated Literary Texts

Fiche du document

Date

4 décembre 2019

Type de document
Périmètre
Langue
Identifiants
Collection

Archives ouvertes

Licence

info:eu-repo/semantics/OpenAccess



Sujets proches En

Foreign languages Languages

Citer ce document

Amel Fraisse et al., « Preserving Endangered European Cultural Heritage and Languages Through Translated Literary Texts », HAL-SHS : sciences de l'information, de la communication et des bibliothèques, ID : 10670/1.i1hkzp


Métriques


Partage / Export

Résumé En

We present the interdisciplinary ROSETTA project which consists in collecting all the translations worldwide of one fictional text in order to build multilingual parallel corpora for a large number of under-resourced languages. Building such corpora is vital to help preserve and expand language and traditional knowledge diversity. These corpora will be useful to handle under-resourced languages in a number of interconnected research fields such as computational linguistics, translation studies and corpus linguistics. Our project taps into a wealth of translated versions of a single fictional text spanning a period of over a century. It consists in collecting, digitizing, transcribing and aligning translations of this text. Our data collection process is based on volunteer work from the scientific and scholarly communities, the power of the crowd and national libraries and archives. Our first experiment was conducted on the world-famous and well-traveled American novel "Adventures of Huckleberry Finn" by the American author Mark Twain. This paper reports on the parallel corpus that are now sentence aligned pairing English with Basque.

document thumbnail

Par les mêmes auteurs

Sur les mêmes sujets

Sur les mêmes disciplines

Exporter en