2015
http://hal.archives-ouvertes.fr/licences/copyright/
Dejan Stosic et al., « ParCoLab, a Parallel Corpus of French, Serbian and English », HAL-SHS : linguistique, ID : 10670/1.pzv052
ParCoLab is a 12-million-word parallel corpus containing original and translated texts in three European languages: Serbian, French, and English. Each of the languages functions both as a source and as a target language.The texts included in the corpus, which are mainly literary, are paragraph- and sentence-aligned. The alignments have been manually validated, which guarantees their quality. ParCoLab is also distinguished by the fact that it follows the current standards of corpus creation and distribution (it is stored in a TEI-compliant XML format).The ParCoLab parallel corpus can be queried online for free. A search engine allows users to formulate queries and extract sentences containing the target expression, as well as the corresponding sentences in one or both other languages.As a work in progress, the corpus is in continuous qualitative, quantitative, and technical development.