Imparare a quantificare guardando

In this paper, we focus on linguistic questions over images which may be answered with a quantifier (e.g. How many dogs are black? Some/most/all of them, etc.). We show that in order to learn to quantify, a multimodal model has to obtain a genuine understanding of linguistic and visual inputs and of their interaction. We propose a model that extracts a fuzzy representation of the set of the queried objects (e.g. dogs) and of the queried property in relation to that set (e.g. black with respect to dogs), outputting the appropriate quantifier for that relation.

Imparare a quantificare guardando

Fiche du document

Citer ce document

Métriques

Partage / Export

Résumé En It

Par les mêmes auteurs