“I sing the body algorithmic” Machine Learning and Embodiment in Human-Machine collective music-making

Pierre Saint-Germier, « “I sing the body algorithmic” Machine Learning and Embodiment in Human-Machine collective music-making », HAL-SHS : philosophie, ID : 10670/1.0xiavi

Partage / Export

Résumé En

In spite of recent progress in Computer Vision, Image Generation, Natural Language Processing, and (arguably) General Artificial Intelligence, AI research based on the application of deep learning techniques to digital or digitalized data seems confined to the digital realm. Several decades of work in the field of Embodied Cognition (Shapiro 2014) suggest however that there is a limit to the kind of capacities that can be conferred to artificial agents by algorithmic means. In cases where machine learning exploits data containing information about embodied states or processes (e.g., motion capture data, vocal signature data, recorded musical performances), it seems plausible that some sort of embodiment may be preserved through machine learning. What remains to be clarified is the sense in which data may be said to be embodied, and whether machine learning from such embodied data is sufficient to confer at least some of the advantages of embodiment to algorithmic agents. The present paper proposes to contribute to this clarification by a combination of conceptual analysis and experimental study, focusing on the case of human-machine co-improvisation in musical AI.Research in the field of Embodied Music Cognition (Lesaffre et al. 2017) has shown that embodiment is essential to expressive and interactional properties of collective music performance. Furthermore, collective musical improvisation instantiates the sort of Continuous Reciprocal Causation (Clark 2008) for which Embodied Cognition approaches are particularly suited. Finally, the use of machine learning techniques has recently led to important progress in the design of algorithmic agents for collective improvisation. For instance, the SoMax2 application, designed at IRCAM, outputs stylistically coherent improvisations, based on a generative model constructed by machine learning, while interacting with a human improviser (Borg 2021). The musical agency of SoMax2 is essentially algorithmic in the sense that the physical properties of the sensors (microphones) and effectors (loudspeakers) play no role in the generation of the musical output. This makes human-machine co-improvisation a particularly relevant case study.On the conceptual side, we argue for a distinction between two orthogonal dimensions of embodiment. On the one hand, the musicians’ embodiment qua multimodal resource provides visual and as well as auditory cues that facilitate musician coordination (Moran 2015). On the other hand, the contingencies of the musician’s body (e.g., the fact that a pianist has two hands of five fingers each) limit and shape the sort of musical signals that may be produced. This is in particular the source of instrumental idiomaticity and gestural expressivity in music (Souza 2017). In virtue of this limiting and shaping effect on the space of all possible musical signals, embodiment qua generative constraint allows listeners and co-improvisers to exploit low-level perceptual expectations, which are the basis for the perception and appreciation of musical expressivity (Meyer 1956; Gingras et al. 2016), as wellas coordination within collective improvisation (Vesper et al. 2010). While embodiment qua multimodal resource is not easily reflected in audio data and Musical Instrument Digital Interface (MIDI) data, these data plausibly bear the marks of the shaping effect of embodiment qua generative constraint. We submit the view that embodiment qua generative constraint may be transferred from the corpus data to the musical behavior of an algorithmic agent. The isolation of the concept of embodiment qua generative constraint provides an explanation of the sense in which algorithmic musical agents may nevertheless be embodied via machine learning.The argument presented so far relies on the empirical assumption that the marks of embodiment in the corpus data are able to provide the agent with some of the benefits of embodiment. In order to fill this gap, we conducted a series of experimental studies. We collected a corpus of MIDI data by having an expert pianist record 7 miniature solo improvisations (min=2’03; max=3’13) and one long solo improvisation (10’30). We operated three manipulations to the resulting MIDI data, designed to selectively erase the marks of the bodily constraint: (a) the randomization of all the notes’ velocity (e.g., the MIDI encoding of the loudness of notes), erasing all the dynamical shapes while keeping all harmonic and melodic information; (b) the application of random octave jumps to all the notes, erasing melodic shapes, but keeping all dynamic and harmonic information; (c) the combination of (a) and (b). We then isolated randomly chosen 15-second excerpts from each track of the corps, applied the aforementioned manipulations to all of them, presented the resulting 32 excepts to a group of (self-declared) musician subjects (n=29) and asked them to judge, on a scale from 0 to 10, the pianistic plausibility of the excerpt. A one-way ANOVA revealed a significant effect of Manipulation (F=12.601, p

“I sing the body algorithmic” Machine Learning and Embodiment in Human-Machine collective music-making

Fiche du document

Sujets proches En

Citer ce document

Métriques

Partage / Export

Résumé En

Par les mêmes auteurs

Sur les mêmes sujets

Exporter en