Facing the facts of fake: a distributional semantics and corpus annotation approach

Fiche du document

Date

28 novembre 2018

Type de document
Périmètre
Langue
Identifiants
Collection

Archives ouvertes

Licence

info:eu-repo/semantics/OpenAccess



Sujets proches En

forged

Citer ce document

Bert Cappelle et al., « Facing the facts of fake: a distributional semantics and corpus annotation approach », HAL-SHS : sciences de l'information, de la communication et des bibliothèques, ID : 10670/1.magteq


Métriques


Partage / Export

Résumé En

Fake is often considered the textbook example of a so-called 'privative' adjective, one which, in other words, allows the proposition that '(a) fake x is not (an) x'. This study tests the hypothesis that the contexts of an adjective-noun combination are more different from the contexts of the noun when the adjective is such a 'privative' one than when it is an ordinary (subsective) one. We here use 'embeddings', that is, dense vector representations based on word co-occurrences in a large corpus, which in our study is the entire English Wikipedia as it was in 2013. Comparing the cosine distance between the adjective-noun bigram and single noun embeddings across two sets of adjectives, privative and ordinary ones, we fail to find a noticeable difference. However, we contest that fake is an across-the-board privative adjective, since a fake article, for instance, is most definitely still an article. We extend a recent proposal involving the noun's qualia roles (how an entity is made, what it consists of, what it is used for, etc.) and propose several interpretational types of fake-noun combinations, some but not all of which are privative. These interpretations, which we assign manually to the 100 most frequent fake-noun combinations in the Wikipedia corpus, depend to a large extent on the meaning of the noun, as combinations with similar interpretations tend to involve nouns that are linked in a distributions-based network. When we restrict our focus to the privative uses of fake only, we do detect a slightly enlarged difference between fake + noun bigram and noun distributions compared to the previously obtained average difference between adjective + noun bigram and noun distributions. This result contrasts with negative or even opposite findings reported in the literature.

document thumbnail

Par les mêmes auteurs

Sur les mêmes sujets

Sur les mêmes disciplines

Exporter en