15, 20 minutes.
I'm sort of mildly surprised to kick you off.
But I was sort of asked to come and share some sort of like systems stuff that might be interesting to, you know, both to ACCU C++ folks and to AI folks.
So the easiest thing I could think of with a very short kind of warm up is just telling a bit about the kind of stuff that we work on.
So yes, why a banana?
No reason this talk could go hideously wrong and if I slip over on it.
So
Yeah, I'm Pete.
I've written a bunch of books, software development books.
I've been a regular magazine columnist.
I think I've had the longest running magazine column in software development.
It was running for, I've missed a few now, but it's been running for 24, 25 odd years.
So yeah.
I should find better things to do with my time.
But yeah, I'm an author, columnist, I'm a musician, which obviously helps, and I'm a CTO of a company called In Music, which I'll talk about in a moment.
And one of the most important things is, also, I don't know it all.
Like anyone who stands up in front of anyone and starts having a conversation, you have to admit you don't know it all, and I'm always learning.
And one of the things I'm really excited about being here is to learn from some of you guys, because I know...
boggle about AI stuff.
And maybe after this presentation, some conversations afterwards as we have some pizza, you could tell me how we might apply some of this tech to what I'm doing.
So I'd love, love to hear about that.
Anyway, in music itself, how many folks are musicians in one form or another?
Hands up.
Cool.
I love all you people.
So hopefully one of these brands probably means something to you if you're a mildly geeky musician, which I am.
I'm a geeky musician.
I love this.
So Air, Akai, Alesis, Alto...
BFD Moog, Newmark.
These are all brands that my company represents.
And this is the kind of stuff that we make.
We make drum machines.
Most hip-hop music is made on this kind of nonsense.
Keyboards, guitar pedal boards.
So if you're a guitarist, we make some really cool amp modeling stuff.
DJ products, these wonderful Moog synthesizers, and various bits of software, as well as things like sort of wind controllers and...
electronic drum kits.
So as I say, I'm a geek and I'm a musician.
I love making this kind of stuff and we make really fun stuff.
Now, my promise to you
is that in a very short period of time, I'm going to make you experts.
No, I'm going to at least give you a bit of an understanding about some of the practicalities and issues of creating audio systems in C++ and why you might do it in C++.
It doesn't have to be.
It's obviously 10,000 foot view.
And again, as I said, I'm still learning and maybe you can show me some better ways of doing this or throw some awesome AI stuff into here as well.
Off we go.
Sound.
What is sound?
Well, sound, as we represent it in a digital system, is basically a series of waves in the air, and we sample that.
Maybe if you imagine in a microphone this displacement of a diaphragm, and in a period of regular timestamps, you take a snapshot of the position where that diaphragm is, and you represent that
as numbers, and that is how we represent sound in the digital system.
So here, some floating-point numbers.
It doesn't have to be floating-point numbers, but these days, it's probably how we're gonna do it.
And that is basically, there's a sound wave, and I'm gonna write some software that's gonna play this back to you in some interesting way.
Now, as we tend to do this in most systems, some sort of DSP embedded little things maybe work sample by sample as they're generating sound, but really what we tend to do because CPUs have caches and as you work on algorithms we can pull them together very quickly, we tend to block
audio into a chunk so some chunk of audio we're going to generate in a time frame to make some sound and then we're going to send this chunk of audio to the audio hardware and then that chunk of audio can be queued up and that chunk of audio is going to be queued up to go to the audio hardware maybe something around you know a typical the cd sample rate is 44.1 kilohertz that's the number of samples per second
If we're working at those blocks of 128 samples, I've got about 3 milliseconds to make each of those blocks of audio.
Hmm, that's easy, because CPUs are super fast, right?
This is kind of a real-time system.
So here we're balancing the CPU speeds with the buffer sizes that I've just described.
This is particularly important.
So, say I'm making the Honkatron.
I've just invented this.
So the user's gonna push the honk button, and at some point it has to go honk.
So I'm currently, the CPU, the audio system is currently playing this block of audio, so it pushes a honk button, and I have to make a honk noise that's gonna be in that block there.
So the block size is very important, because if I press the honk button, and my block size is very long, we'll wait, we'll wait, we'll wait, and then honk.
So the trick there is, obviously the block size
The smaller the block size, the more responsive the application is going to be.
So we want honks as quickly as we possibly can get.
The smaller the block size, obviously the shorter time you have to generate this audio.
And if you're doing something like Netflix and you're just playing a WAV file in the background of time sync to a video, that's relatively easy.
But some of those products, like a DJ product that's scratching or a keyboard that has to instantly play the voices instantaneously as you hit the keys, that's quite a challenge.
So, we have these regular heartbeats, and we have to deliver audio within those heartbeats.
It's really important that the time it takes to generate that audio is not longer than the time I have to give the audio back to the hardware to be played.
Otherwise, things fall spectacularly off the back of the bus.
Basically, as soon as your elapsed time takes longer than the playback period, you'll get clicks and pops, and your honk won't be a honk, it'll be a honk, honk, honk, honk kind of noise.
It's kind of really interesting as the psychological phenomena of why this matters.
Audio perception is... Now, this was going to be a C++ talk, but now I'm talking psychology.
There's not much C++ in here at all.
But audio perception is...
significantly more acute or when it goes wrong than the video response.
So, I mean, I'll do this very quickly, but this, I think, was the first study that went over response to sort of user interface feedback.
So let's call this video.
And basically, your response time of within a tenth of a second is perceived as instantaneous.
You know, within one second or enough, people aren't really going to...
park off, ten seconds you lose the user's attention.
We have ADHD these days, so we'll probably care more about that than that now.
But with audio, notice the difference.
Basically, two sounds, if you put two sounds close together, within a range of like one to five milliseconds for clicks, they would sound the same.
Past five milliseconds, it sounds like a flaming noise.
They're not the same sound.
Forty milliseconds for more complicated sounds.
But basically,
the user does not want to hear any audio dropout, it's going to be very audible and is a disaster.
That's why this is a real-time system.
I mean, not hard real-time, no one's going to die, but we might break a speaker because the pop noise would actually be potentially very damaging to the speaker.
William Shakespeare was an awesome audio programmer.
So, what we have to do, basically, is avoid these kind of blocked timeframes when audio might overrun the buffer size.
Of course, CPUs these days are multi-threaded, so that's easy.
We'll just push all the work we don't have to do on the audio thread off onto some other thread.
So if you've got some, obviously, if you were doing user interface polling or anything like that whilst you're trying to generate the audio, that's just sound.
It's like an absolute nightmare to me.
Put that to audio thread.
When we are, you know, these days, multi-threaded computers, we don't just have one audio thread.
We'll probably fan out into multiple audio threads to get lots of work done at once.
But of course, if all those threads don't scatter and gather together in the timeframe, we will have terrifying audio noises.
So this is the information that's gonna make you all expert audio programmers.
How do we make this work reliably?
Basically, it's all about good design and doing sensible things.
There is obviously no way I can go into any useful detail about this, but I'm stupidly happy to talk about kind of these kind of things later on if you want to go through.
But basically we're trying to push off any work that isn't required to be done on that audio thread elsewhere.
If you can pre-render stuff, pre-render stuff on another thread and play it back on this audio thread.
If you need to calculate coefficients and it takes a long time to calculate the coefficients, push it up onto another thread so that you basically free up as little work as possible to be run on the main audio thread.
And one of the key things is, if you don't know how long it's gonna take, don't bloomin' do it.
And if you don't know how long it's gonna take, what's the definition of that?
Pretty much always, if anything involves taking an operating system lock or a mutex, you are doomed.
Don't do it.
And when you begin to realize this, this is our,
our mantra, our mindset, most everything you would probably do in any normal programming language is going to involve taking a blasted lock.
So, obviously, first of all, on your audio thread, don't claim the mutex.
Next thing, don't allocate any memory because literally any language, when you get down to the main memory allocation system, it goes down to something that takes one global lock.
Don't do any logging to any kind of file system unless you've written it yourself and you know what the hell your logging system does because that will take a lock.
Don't do any disk IO, might not take a lock, but if you're waiting for the audio to come back, you'll be waiting a long time because disks are much slower than your audio thread.
Basically, there's a whole bunch of stuff like that that we have to consider as we design these systems.
Again, basically, there's a level of distrust.
If you didn't write it and you don't know what it does, don't use it.
That's a little extreme, but it's a good mantra to live by.
And then, you know, I'll take this one down to land in the, as I said, to explain why we use C++.
To some extent...
The reason that most audio systems, not all audio systems, but most audio systems these days are still written in C++ is A, C is horrific, but C++ basically gives you that level of control.
It is still our assembly language for programming.
Cool, you could write something in Swift,
or Java, JavaScript, and people do do that, but they go wrong because all these languages that are interpreted, garbage collected, or maybe under the covers do reference counting or loads of the implementations of these languages are lock taking, basically cause audio glitches.
And so C++ is the one very expressive language that still has high level design concepts, but allows you to write something that you can basically inspect
and know that it's not going to glitch on the audio thread.
Still doesn't mean it doesn't hide some of these memory allocations for you.
C++ devs, you know std vector.
It's our sort of basic memory block sort of data type.
But if you use it incorrectly, it will go and allocate for you in the background.
You didn't realize it.
However, we use those data structures on the audio thread all the time.
We just basically have to learn to use our brains.
So,
So yeah, it's a convention.
It's the language that gives us sharp tools.
But also, because it is that language, there are a large number of audio frameworks that are written in C++ that we use.
So Juice is a very famous one, but there's a bunch of other ones.
And these are the things that basically we rely on.
And whenever you're building anything, you want to stand on the shoulder of giants, reuse those things.
So most of the things that we would use in high quality audio products exist in a Retina C++.
And that's basically my 10,000 foot view.
I'm stupidly happy to go into as much detail as anyone wants because I geek out on this stuff.
But also, you know, this is a 15 minute talk and...
If you didn't find it interesting, I'm sure you're going to find the next one very interesting indeed.
In the meantime, basically, you need to know the rules so you know how to break the rules like a pro.
That is as much as I would like to share with you.
If you have any questions, please ask.