Models of Visually Grounded Speech Signal Pay Attention to Nouns: A Bilingual Experiment on English and Japanese

Fiche du document

Date

12 mai 2019

Discipline
Type de document
Périmètre
Langue
Identifiants
Relations

Ce document est lié à :
info:eu-repo/semantics/altIdentifier/arxiv/1902.03052

Ce document est lié à :
info:eu-repo/semantics/altIdentifier/doi/10.1109/ICASSP.2019.8683069

Collection

Archives ouvertes

Licence

info:eu-repo/semantics/OpenAccess



Citer ce document

William N Havard et al., « Models of Visually Grounded Speech Signal Pay Attention to Nouns: A Bilingual Experiment on English and Japanese », HAL-SHS : linguistique, ID : 10.1109/ICASSP.2019.8683069


Métriques


Partage / Export

Résumé En

We investigate the behaviour of attention in neural models of visually grounded speech trained on two languages: English and Japanese. Experimental results show that attention focuses on nouns and this behaviour holds true for two very typologically different languages. We also draw parallels between artificial neural attention and human attention and show that neural attention focuses on word endings as it has been theorised for human attention. Finally, we investigate how two visually grounded monolingual models can be used to perform cross-lingual speech-to-speech retrieval. For both languages, the enriched bilingual (speech-image) corpora with part-of-speech tags and forced alignments are distributed to the community for reproducible research. Index Terms-grounded language learning, attention mechanism , cross-lingual speech retrieval, recurrent neural networks.

document thumbnail

Par les mêmes auteurs

Sur les mêmes sujets

Sur les mêmes disciplines

Exporter en