4 novembre 2019
info:eu-repo/semantics/OpenAccess
Jean-Francois Ducher et al., « FOLDED CQT RCNN FOR REAL-TIME RECOGNITION OF INSTRUMENT PLAYING TECHNIQUES », HAL SHS (Sciences de l’Homme et de la Société), ID : 10670/1.168911...
In the past years, deep learning has produced state-of-the-art performance in timbre and instrument classification. However, only a few models currently deal with the recognition of advanced Instrument Playing Techniques (IPT). None of them have a real-time approach of this problem. Furthermore, most studies rely on a single sound bank for training and testing. Their methodology provides no assurance as to the generalization of their results to other sounds. In this article, we extend state-of-the-art convolutional neural networks to the classification of IPTs. We build the first IPT corpus from independent sound banks, annotate it with the JAMS standard and make it freely available. Our models yield consistently high accuracies on a homogeneous subset of this corpus. However, only a proper taxonomy of IPTs and specifically defined input transforms offer proper resilience when addressing the "minus-1db" methodology, which assesses the ability of the models to generalize. In particular, we introduce a novel Folded Constant Q-Transform adjusted to the requirements of IPT classification. Finally we discuss the use of our classifier in real-time.