ParCoLab, a Parallel Corpus of French, Serbian and English

Fiche du document

Date

2015

Discipline
Type de document
Périmètre
Langue
Identifiants
Collection

Archives ouvertes

Licence

http://hal.archives-ouvertes.fr/licences/copyright/



Sujets proches En

Frenchmen (French people)

Citer ce document

Dejan Stosic et al., « ParCoLab, a Parallel Corpus of French, Serbian and English », HAL-SHS : linguistique, ID : 10670/1.pzv052


Métriques


Partage / Export

Résumé En

ParCoLab is a 12-million-word parallel corpus containing original and translated texts in three European languages: Serbian, French, and English. Each of the languages functions both as a source and as a target language.The texts included in the corpus, which are mainly literary, are paragraph- and sentence-aligned. The alignments have been manually validated, which guarantees their quality. ParCoLab is also distinguished by the fact that it follows the current standards of corpus creation and distribution (it is stored in a TEI-compliant XML format).The ParCoLab parallel corpus can be queried online for free. A search engine allows users to formulate queries and extract sentences containing the target expression, as well as the corresponding sentences in one or both other languages.As a work in progress, the corpus is in continuous qualitative, quantitative, and technical development.

document thumbnail

Par les mêmes auteurs

Sur les mêmes sujets

Sur les mêmes disciplines

Exporter en