26 juillet 2017
https://creativecommons.org/licenses/by-nc-nd/4.0/ , info:eu-repo/semantics/openAccess
Sandro Pezzelle et al., « Imparare a quantificare guardando », Accademia University Press, ID : 10.4000/books.aaccademia.1823
In this paper, we focus on linguistic questions over images which may be answered with a quantifier (e.g. How many dogs are black? Some/most/all of them, etc.). We show that in order to learn to quantify, a multimodal model has to obtain a genuine understanding of linguistic and visual inputs and of their interaction. We propose a model that extracts a fuzzy representation of the set of the queried objects (e.g. dogs) and of the queried property in relation to that set (e.g. black with respect to dogs), outputting the appropriate quantifier for that relation.