Towards phonetic interpretability in deep learning applied to voice comparison

Fiche du document

Date

15 août 2019

Discipline
Type de document
Périmètre
Langue
Identifiants
Collection

Archives ouvertes

Licence

info:eu-repo/semantics/OpenAccess



Sujets proches En

Speaking Pattern Model

Citer ce document

Emmanuel Ferragne et al., « Towards phonetic interpretability in deep learning applied to voice comparison », HAL-SHS : linguistique, ID : 10670/1.audhv1


Métriques


Partage / Export

Résumé En

A deep convolutional neural network was trained to classify 45 speakers based on spectrograms of their productions of the French vowel /ɑ̃/ Although the model achieved fairly high accuracy – over 85 % – our primary focus here was phonetic interpretability rather than sheer performance. In order to better understand what kind of representations were learned by the model, i) several versions of the model were trained and tested with low-pass filtered spectrograms with a varying cut-off frequency and ii) classification was also performed with masked frequency bands. The resulting decline in accuracy was utilized to spot relevant frequencies for speaker classification and voice comparison, and to produce phonetically interpretable visualizations.

document thumbnail

Par les mêmes auteurs

Sur les mêmes sujets

Sur les mêmes disciplines

Exporter en