Hi, my name is Ryan Groves and I'm here to talk about new creative opportunities with AI in music.
And so my background, I've been creating AI-driven music apps since about 2010. I built the first text-to-singing music app in 2015.
It was a mobile app, got integrated with Facebook Messenger's tray so you could render out a lyric video and write your own lyrics to Lady Gaga songs or Queen songs. And it would sing it in this sort of robotic voice.
I'm personally a composer and a multi-instrumentalist, and did my master's in machine learning and music. And for the last seven years, I've been focused on generating music with AI.
So I have two roles right now. I'm the CTO and co-founder of Infinite Album, where we generate music for the video game use case. So it's an infinite adaptive stream of music.
I also run the AI Song Contest, which is a nonprofit that runs a yearly contest for the ethical use of AI in the song making process. So we're really trying to explore the new creative opportunities that artists have with these AI tools.
So I wanted to show you a little example of what people often think of when they think AI music. And so here is a robot that plays marimba.
So part of the reason I like to show that is that there's a lot of sort of preconceptions and misconceptions about what it means to use AI to create music. And for example, in that particular case, there's so many different systems and like let alone the robotics involved with having an actual robot play the marimba. There's also, you know, detection of the tempo.
There's detection of the pitches that are happening. There's sort of the understanding of what's happening from the other players. So there's quite a lot of effort that goes into that.
And it's not just, you know, a lot of people think like, oh, there's just one AI model that's now like this sort of you know, general intelligence kind of being and it's usually not like that.
And so my question is, what does it mean to be creative as humans? Does anybody have any thoughts on that?
What's that? which is exactly that. Yeah.
He really makes the distinction which legally is really translated between the concepts what you have in your mind and how you formalize it. The creation is actually the moment when you formalize it and not before.
What's interesting is that with Stefan Zweig and with the law, they only considered actually in the previous presentation, they only considered text. music and visual arts and all the other senses, like perfume, odors, they're not thought of as creations, which is interesting.
That's super interesting. Do you expect an actual answer? No, no, and I wasn't going to give an answer either.
No, that's fantastic. And I think with music, we have that... Music is a very interesting thing because audio gets processed in a different way in your brain, and you actually sort of recreate the frequencies with your brainwaves a little bit.
We are multimodal. Yeah, exactly. And then also the learning of music is very physical as well.
So there's physical memory in instrument performance. So I think music has a different dynamic, a few different dynamics.
which were replicated by this robot in that case.
But there's a big question right now with ChatGPT and these LLMs, which is, are they actually reasoning or are they just recalling a lot of information and doing sophisticated pattern matching? And I think that's a similar question to, can an AI agent be creative in the way that we think of humans being creative? And I think the evidence generally suggests that it's just very sophisticated pattern matching and recalling, which is why that all of these models are trained on massive amounts of training data.
What is the evidence that suggests we reason? Extremely sophisticated recall. That's a good question.
We don't need the same amount of data, right? We do. Actually, the current language models have the same amount of data as a seven-year-old approximately.
But there's no way we could ever read that amount of text. But it's coming into our brains based on all the multimodal stimuli that our body absorbs. And I am just...
So the seven-year data point comes from Yann LeCun. I mean, I don't want to dig into too much, but that was his current estimate of what the data is that we are working at. So does an AI, an LLM, reason better than a seven-year-old?
It's a very good question. It definitely gets me better results. Yeah.
But it's not, it's definitely not perfectly reliable, you know, the, the, those. Definitely is not perfect. Yeah, yeah. No, I agree. Yeah.
The argument is basically also just efficiency, right? Like how much data it doesn't need, it has. is it better or worse than the seven-year-old?
It's just, like, how well does the seven-year-old learn? It's the reasoning bit. So the interesting bit between recall and reason for me is that we never define what reason really is.
And every time that people try to define reason, you then go to an LLM, and then you get to the AI, and the AI actually demonstrates that thing. It's a similar thing of what is AGI. It's like, then you get to the point, the test that you determine, and now that's no longer AGI, so you have to determine another test.
And that's the, so how would you define reason in order to then say that the evidence suggests that this is not reasoning? I think it is extrapolating concepts across topics. I think that's a large part of it, which, which is why, which is why, no, exactly.
It's, it is, it's not, That's why I say it suggests recall because it's not a binary like yes or no kind of a question. And I think we probably are somewhere in between recall and reason at the moment.
But yeah, so I mean, I think we could talk about that probably for a lot longer. Oh, yeah. And so one author argues that it's basically a stochastic parrot.
So it's sort of randomly regurgitating information. And so now we're at a point where we essentially have a musical stochastic parrot, which is in these companies called Suno and Yudio. Stochastic?
Stochastic is like a random process, sampled from potentially a probabilistic process. But it basically means random. So yeah.
We're at this point now with music where we can essentially, we've sort of solved music generation in a way. So in the same way that you expect ChatGPT to have a conversation with you and answer questions, we can now generate songs.
And it's done largely in the same way by allegedly scraping all of copyrighted music. And now there's lawsuits happening from the major record labels that are suing these particular companies that are sort of state of the art.
And so that has increased this perception of AI as this sort of text prompting, you know, God machine thing. And then, you know, we've shown, people have shown that actually even the output, so it's not just a problem with training data, but also the output of these systems can also infringe on copyright of musical songs.
So they create sound-alikes essentially of famous Green Day songs or famous Queen songs. And there's been a lot of analysis through this lawsuit and other people that have shown that.
So the question is, can we use AI ethically in this creative process? And I run this AI song contest where we're exploring exactly that question and highlighting those people that do use these tools ethically. So we run this yearly contest for the ethical use of AI.
We have them submit a song and a process document which describes all of the things that they did in order to create that song, including the AI models they used, the data that they trained on. and their sort of artistic vision behind this song.
And what we found is actually transcultural exploration has been a recurring theme. So people have been sort of leveraging the systems to
either translate one culture's performance into another culture's sonic palette for example so there was one one of our winners was a thai artist who was translating thai traditional instruments into western electronic music another example is our winner from this year they found they were from south america and they asked an ai to generate south american music And it came out with all of these very stereotypical sounds and even some like caveman grunting. So there's very problematic results when they did this. So then they actually collage that into their own song.
So I guess the gist is that there is an opportunity here to extend creativity. And so we did an analysis of how these artists are using AI in their song making process.
And so we found that they're mainly applying AI on two different sort of categories of tasks, I guess. And that is in the song process and the song elements themselves.
And so the song process means they'll use AI to generate ideas, to do composing, so actually writing the notes, arranging, which is placing multiple parts together, evaluating the quality of the work that they've been doing, mixing and mastering, which is very much sort of studio work, and then performance, so making music sound lifelike that's not already performed by a human. And then in the song elements, people just use particular tools to generate particular pieces of music that they use as part of their song, which can include essentially all of the different tracks that might be in a song, but then also things like the structure of the song or the lyrics of the song or synthesizing a new voice.
And all of that is to say that there is so much more than just this text prompting paradigm where you just write a text and generate a full track.
And so that sort of leads me to my work in the video game space. So we specifically tried to build an AI that would do something that humans can't do. And what we built is something that essentially sits next to a game that you're playing and watches what you are doing in the game.
and composes music according to your gameplay so the way that we did that is we created an ai that will generate music in real time and it at any moment the emotion of that music can change and then the music will adapt to that emotion and so then we then take these game events and map it to the emotions and then create an emotionally appropriate based on the same And so we're generating all of the music that's happening. We're synthesizing the audio.
It's unique to every individual. yeah it is and we have ways of so in this this slide this is sort of our paradigm for how you create music in our system so if you want to create a song for that if you want to personalize and create your own song you can select a style you can select a starting emotion and then you can also select the mapping so if you're playing fortnite you can say okay when i die i wanted the music to get sad or if i kill someone the music will get happy um yeah Yeah, headshots are excited.
So you can generate the song and we encapsulate that in what we call a vibe, which is a repeatable musical scenario. But then when you play this vibe and start playing the game, it's an infinite stream of music that will change based on what's happening. Technically you get a unique, a completely unique song every time, but at least you have sort of this reference point so you know in general what it will sound like.
We actually work with a platform that allows people to build apps on top of games. So they've exposed all of the game information. It's called Overwolf.
So a lot of the apps are like how accurate is your shooting or what are your stats on grinding for experience points or whatever. So we were one of the first creative apps on the marketplace. So this is Infinite Album.
And like I said, this runs alongside the video game that you're playing and essentially is listening to game events that are happening. And so we've built our own set of styles. So I can show you, I'll start with this artist pack.
So we've started working with artists and we actually have them create content for us and bundle that into a style in our system that enables people to generate new songs. But then we also release vibes, pre-canned vibes from this artist.
Yeah, I mean, we don't believe that it's ethically right to scrape data, and we knew that the music labels were very litigious and have historically been very litigious. And it's been interesting to see, actually, as a side note, we've moved from text to image to video and music, and the legal frameworks, the support of legal frameworks strength or the, the legal frameworks and legal, um, I guess, yeah, the, the power of these institutions has grown.
I think as you go from text, like nobody was worried about bloggers, copyright data, copyrighted data, right. Um, which has been scraped down as in chat GPT or like Reddit posts. But then as you move to images, now you have a shutter stock and all these stock libraries who are obviously very concerned.
but they're sort of just individuals. But then with the music industry, these labels have really centralized the ownership of all of this content and therefore they have quite a lot of legal power. And so now we're actually seeing a really strong legal power suing these companies. So that has been really interesting to me to watch because it feels like music is the place where things might actually change for better or worse because these labels have a lot of power.
Yeah, so this is the app. And like I said, we work sort of directly with artists.
And so this actually provides artists a way of essentially releasing their music into... Well, not yet. Yeah, of releasing their music into a gaming context.
So this is a Synthwave artist that we worked with based out of France. So that's sort of one of the songs he essentially gave us, and we sort of can reproduce it, but I'll show you how you can translate it.
So we have this emotion wheel here, and you've got essentially all the emotions you can think of. You move it down to sad, the music will immediately adapt to that context and start generating sad music instead of happy music.
And then you can put it down to calm, for example. Or you could come up to angry, for example.
So that's an example of a style we released with an artist, but we also... Yes? So how do you kind of handle the cross-cultural variation across emotions? Because it might be that, well, I don't know, in another culture, the way they kind of perceive angry music is not the same as here. And so how do you factor that in?
I mean, it's very much representative, I would say. So it's our, well, in this case, the artist actually gives us the training data for the emotions that they are trying to elicit.
And your question actually leads sort of to another question, which is, are people actually experiencing that emotion successfully? Which we've never actually, you know, tried to strap people up to sensors and figure that out. So it really, it still is sort of like an artistic representation of that emotion, I would say.
How many people in this room will have tried Suno? No one. No one, wow. Wow.
Okay, I know you might have some ethical problems with it, but we should. Yeah, I'm happy to. We'll finish with that. Yeah, sure.
There's a pretty great movie that somebody made a Suno soundtrack with, and it's called Titanic with Guns. It's like this rock ballad. It's pretty great actually.
Okay, so we've also created our own genres with our own concepts of emotion for each of these genres. So for example, you have like electro house. So if you wanna generate a new angry electro house song,
And similarly, if you go to calm, the music will become calm. Or if you move it to happy. We also have things like classical ambient, for example.
Let me generate a new song. So this should be happy. Sorry the speakers aren't the best.
So each of these, when you regenerate, it sort of samples from that style and whatever emotion you gave it and creates a new song. And then if you like that one, for example, you can name it Sad Classical. and save it, and now you have essentially a new song which we call Vibes.
And so if you want to, for example, play Fortnite, You can come down to these mappings and say, OK, I'm going to play, like the games are, all of the games that are supported are listed on all of these mappings. And then the description explains what emotions will change based on what actions happen in the game.
We also have sound effects, so you can trigger sound effects as well. And you can even create your own custom ones. So I'll show you here. If you know you're going to play, for example, Counter-Strike, and when you get a headshot, you want to play a sound effect, you can do that, like DJ Airhorn, for example.
So we actually let gamers customize their musical experience as they're playing the game and their audio experience as they're playing the game. Yeah. Yeah.
Do you guys see like any analytics of improvement of gamers who stayed longer, played longer because of music? Or like did you get any feedback?
Yeah, we... Yeah, yeah, definitely. We did do a study early on and tried to measure immersion and also length of gameplay in an environment that had either no music, looping music, or this adaptive music. And we did find there was like a 40% increase in immersion, in reported immersion, and 60%, I think, in time played.
Not only for gaming, but for a lot of things. Yeah. Thank you.