Investigating the Impact of Gender Representation in ASR Training Data: a Case Study on Librispeech

Fiche du document

Type de document
Périmètre
Langue
Identifiants
Relations

Ce document est lié à :
info:eu-repo/semantics/altIdentifier/doi/10.18653/v1/2021.gebnlp-1.10

Collection

Archives ouvertes

Licence

info:eu-repo/semantics/OpenAccess


Sujets proches En

Skills training

Citer ce document

Mahault Garnerin et al., « Investigating the Impact of Gender Representation in ASR Training Data: a Case Study on Librispeech », HAL-SHS : linguistique, ID : 10.18653/v1/2021.gebnlp-1.10


Métriques


Partage / Export

Résumé En

In this paper we question the impact of gender representation in training data on the performance of an end-to-end ASR system. We create an experiment based on the Librispeech corpus and build 3 different training corpora varying only the proportion of data produced by each gender category. We observe that if our system is overall robust to the gender balance or imbalance in training data, it is nonetheless dependant of the adequacy between the individuals present in the training and testing sets.

document thumbnail

Par les mêmes auteurs

Sur les mêmes sujets

Sur les mêmes disciplines

Exporter en