Uncovering Machine Translationese: an experiment on 4 MT systems for English-French translations

Fiche du document

Date

31 janvier 2020

Discipline
Type de document
Périmètre
Langue
Identifiants
Collection

Archives ouvertes




Citer ce document

Orphée de Clercq et al., « Uncovering Machine Translationese: an experiment on 4 MT systems for English-French translations », HAL-SHS : linguistique, ID : 10670/1.o0k8cf


Métriques


Partage / Export

Résumé En

The aim of this presentation is to discuss the linguistic features of machine-translated texts in comparison with original texts in order to uncover what has been called “machine translationese” (e.g. Daems et al. 2017). Using a corpus-based statistical approach, namely, the Principal Component Analysis technique, 4 MT systems have been investigated for English to French translations of press texts: 1 Statistical MT (SMT) and 3 Neural MT (NMT) systems, namely DeepL, Google Translate, and the European Commission’s eTranslation MT tool, in both its SMT and NMT versions. In particular, to complement a previous study on language-specific features (e.g. derived adverbs, existential constructions, coordinator et, preposition avec, see Loock 2018), a series of language-independent linguistic features were extracted for each text, ranging from superficial text characteristics such as the average word and sentence length, to frequencies of closed-class lexical categories and measures of lexical diversity.The final aim is to uncover linguistic features in MT texts that clearly deviate from the expected norms in original French.

document thumbnail

Par les mêmes auteurs

Sur les mêmes sujets

Sur les mêmes disciplines

Exporter en