Interview with Alexander Schubert and Philippe Esling
On June 11, as part of the 2022 edition of the ManiFeste festival, Alexander Schubert will present Anima, a work that uses tools based on artificial intelligence, while at the same time questioning its very nature. Trained as a scientist, the German composer has joined forces with Philippe Esling's group of researchers in the Musical Representations team at IRCAM. A rich and unexpected collaboration...
Alexander, before you became involved with it in your work as a composer, what image did you have of so-called artificial intelligence technologies?
Alexander Schubert: I studied computer science, with a particular interest in that field - so my first exposure to the subject was more of a purely mathematical approach. The rest came later, and was then largely driven by recent advances in the field of artificial intelligence - advances that opened up processes that were no longer merely symbolic.
Philippe Esling: I suspect that when he came to us, Alexander already had an idea in mind, even if nothing was set in stone: he wanted a computer-generated visual and sonic work. What is interesting, compared to other composers we have worked with, is that he was enthusiastic from the start about understanding the models we presented to him, not just their control and use, but their internal mechanics. This is unusual and important: it is as if, before driving a car, a driver wanted to understand its mechanics.
His approach was very empirical. He did not try to impose his vision or his fantasies of artificial intelligence. He didn't have any preconceived notions, only the desire to build a solid scientific base to understand where to take the tool.
Alexander, what was your first idea when you wanted to use it in Anima?
A.S.: The generation of raw audio and video was a major motivation for me, as well as the possibility of real-time interaction with the machine, which is finally becoming accessible today in terms of computing power. From there, I started to look for the right ways to make this happen. The different aspects of the subject can be divided into two categories: synthesis and sound processing on the one hand, and the generation of symbolic scores for gesture sequences on the other hand.
P.E.: Concerning the first domain, we worked a lot on our sound synthesis and timbre transfer models. With timbre transfer, or morphological transfer, we generate—from a source sound signal—a new sound that follows the same profile (or at least some selected parameters of this profile), but in another timbre (for example: at the input, a violin melody, at the output, a voice). Some expressivity parameters, specific to the playing modes of the source sample (dynamics, vibrato, etc.), can even be restored by adapting them to the specific expressivity parameters of the target timbre. For example, for the voice, by choosing the vocalization (vowels and consonants). Any sound can be transformed into any other sound, in real time - giving rise to a form of dissociation between real and virtual.
Anima™ © Lucas Gutierrez
What was Alexander's process for making the tool his own?
P.E.: The first thing he wanted to do was to "destroy" it. It's a little bit annoying for us but it could be very interesting. We explained to him the limitations of our models. For example, our sound synthesis models work best when they are pre-trained from sound banks with homogeneous distributions, with distinctive timbres. They don't work as well with anything that has to do with noise. We can model a violin sound very well, but not its atypical playing modes. At least for the moment: the sound quality obtained is still unsatisfactory. The model is purely observational and is trained from a corpus provided to it, so if the playing mode in question is not in the corpus, the machine will not be able to imagine it.
It is however in these noise fields that Alexander wanted to work: noise of drills, cracking, screams... In any case it's a feeling we get in his previous works: he not attracted by the symbolic.
For us researchers, this approach allowed us to realize that our models were not working as badly as we had thought! This was an opportunity to improve them. So, as the project progressed, we developed new approaches to meet Alexander's demands. Like a game of cat and mouse, each new model came with its own new failure modes, and he wanted to explore them.
What are the main ways in which you have used artificial intelligence tools? What are their roles in the composition process?
A.S.: In the audio realm, we have focused on three areas of exploration: spoken voice synthesis, spoken voice transformation, and autonomous sound synthesis. We have pre-trained neural networks to generate autonomously either music or spoken voice from a given sound bank, ranging from musical material to language spoken by the members of the ensemble. All this allows us to transform, in real-time, a sound source into another sound speech. For example, to make a musician's voice say what a computerized voice says. On the other hand, the models have been used to continuously generate new sounds in an autonomous way. Both processes serve to create sound material for electronic composition while at the same time intervening in the interaction, processing and random generation, live during the performance.
P.E.: Artificial intelligence does not produce writing, it generates timbres that the composer can explore, relate to other signals, or control by coupling the synthesis with other signals.
A.S.: On the other hand, the machine generates symbolic instructions of movements in real-time and in textual form, which are incarnated by the performers on stage to create choreographic patterns. These instructions are transmitted to the musicians and performers in the moment - creating a choreography in continuous evolution - while establishing an interactive relationship between the machine, the device and the human beings who occupy it.
Anima™ © Lucas Gutierrez
Alexander, is your ambition to question the very principle of artificial intelligence, as a tool and a product of our society?
A.S.: In Anima, artificial intelligence is treated as both a tool and a metaphor. The piece is based on the premise that a system driven by artificial intelligence could serve as a therapeutic group device - and realizes this idea both as a concrete application of this device and as a metaphor for our constructivist view of the world, in the making of our inner self as well as our outer space. Anima thus questions technology as a tool potentially capable of creating and analyzing complex systems. Artificial intelligence is also questioned for its opacity - that is, the fact that it is a black box whose internal mechanisms are hidden from us. From this point of view, artificial intelligence is also the metaphor of a system of which we can only decipher and work on the result it produces. And I want to question here this abandonment, this capitulation, this faith placed in a system that we only partially understand.
Listen to: Alexander Schubert
- Serious Smile by Alexander Schubert (recorded at Cité de la musique et de la danse, 2015)