So I am going to try and show you how we can run some of these large language models on your own machines. And I'm doing that basically ripping off another demo that we had from one of the many meetups that we do around the world. And I just thought it was so easy that I wanted everyone to realize that particular thing.
At this point, I'm using... these chatbots so much that when I'm not able to use them, I almost feel like it's not worth doing any really complex task because I'm going to have to redo it whenever I have access to one of these models again. And so it's like time lost almost. It's like doing intense calculation without a calculator. It's like, yes, you can do it, but is it really worth going through all that time.
And so I fly a lot. Weirdly enough, I come here almost every month. I go to New York every month. I do these things in Madrid and other places. So I don't even have access or good internet access in many cases.
And that is without a massive problem that a lot of people, I don't know how many in this room, but people around the world have, which is privacy. which is that, okay, I personally don't have a problem with sending stuff to OpenAI and ChatGPT, but lots of people do. 1And so the idea that you could run all of this on your own machine, that it never leaves the confines of your own machine is something that I thought was really appealing.
So, there is, whoop, there is an app called LM Studio, whoop, not LLM Studio. There we go, lmstudio.ai, very simple. Their website is as simple as you could get. You literally have three buttons to download, whether you're on Windows, on Mac, or on Linux.
And obviously, I already have that downloaded, so we don't have to go and do that. I'm just going to run that now.
So you can definitely see this is built by engineers. I think mostly for engineers, because most non-engineers are not really executing these on their own machines. But I think that's temporary. And I think that's temporary only because of how damn easy this is.
So the screen here is not the best that we have. But you can see an interface here.
This is the the home page so you can see I can search for different models that exist This is a massive thing with everything open source The amount of models that you can literally just tap into download on your own machine and play with it's pretty big here You can see there's stable code instruct 3b Starling LM 7b beta Google's Gemma to be instruct now I could just launched the actual download here, but I'm on my hotspot. I was hoping if I had a better internet connection, then I would have probably just actually launched the download. But this is the one part of the demo that you're going to have to trust me on.
The only thing you have to do is literally click the Download button. Now, this entire demo is running off of a MacBook Air. So I want you to keep that in mind. I've got, I think, 8 gigs of RAM on this machine. It's basically the bottom of the range of what you can buy today, at least from the Mac range.
If you run this on a MacBook Pro, you can go for much bigger models and so on.
But in my case, I have pre-downloaded two models. And I'm going to go into the AI chat interface. I'm going to delete this. And you can see that I, at the top here, actually have three models. I've got the stable LM, PHY2, and then LAMA7B.
LAMA7B, I'm not going to load. When I load that up on my MacBook Air, my MacBook Air grinds to a halt. The bigger the model you download, the harder it's going to be for your machine to compute it.
You have some indications as you navigate through that interface I was talking about before. So if I go back to home here, you can see that this says this requires eight gigs of RAM or more. All of them require eight gigs of RAM or more. But because my machine only has eight gigs, it's not showing me the ones that that actually require more. It's basically saying buy a new machine.
But I'm going to go look at the stable element at the moment. And now it is loading up this model. So you can see RAM usage at the moment is 1.7 gigabytes, CPU usage, zero.
And what we're gonna try and do, I have prepared here, we're gonna actually try and we can go through any normal task that we would want with an LLM, but we're gonna try and say, write me a song about a practical AI meetup in, hosted by MindStone in a church. Okay.
This is now 100% executing on my machine. And you see how fast this is coming through. This is absolutely crazy.
Now, I don't know what the quality of the actual result here is, to be clear. It's getting something out. So very, very simple.
Welcome to the world of AI, where minds come together to explore. In Toronto, a city that's grand, a mindset event, a chance to stand. Oh, well.
The other thing that they just came out with, which I thought was interesting, is you can... run this in the playground and I haven't properly tried this yet.
Okay, so what do you have here? So, ah, that doesn't, okay, load model. Okay.
So now it's loading the second model. I haven't tried this on my machine before, so it might very well be that loading two models on my machine is going to make everything go crazy.
And then here I'm going to put the same prompt back in. I'm going to hit send. And it's supposed to now, this is now supposedly going to execute it on both models in parallel.
Where do we go? Okay. You can see that it's dramatically slower at this point.
Yes. I haven't at the moment. No. Yes.
Both, but yes. Sorry? Yes, that you can download directly, they are indeed.
Yep. It's not, no. I mean, I can show you right now if my mouse, oh see, ooh, what I said was gonna, might happen is actually happening.
Let me just stop the generation for a second. My, my machine is grinding to a halt. Okay.
It is working exactly. I think I might have lost my machine here. Come on, stop generating.
You're going to have to trust me that if I actually switched off the Wi-Fi, it would continue generating the response. This is a shame because I actually wanted to do the last bit of the demo as well.
We're not going to see the very last bit of the demo because of my machine. Now, you know what, I'm going to try and kill it. So I'm gonna go down the heart path instead and hopefully that should be fast enough. There we go.
Let's see. Yeah, whilst we're doing that, we can go with the rest.
Yes, another question. Yeah. Yeah.
You can give it like these elements. This is like the same thing as if it were executing on a server. But like when you are sending a request to, to GPT four to chat GPT or GPT four through the API, basically just executing on the server and then coming back here, you have the same actually within the same environment. You can give it presets, you can give it the system prompt, you can do everything you want with, with it there as well.
Yeah. Uh, well you do have the same context window problems. Now I think you don't have the same, uh, cut off exactly cause that depends on the models and it depends on the amount of compute you have available as you're, as you're going through. So the bigger your machine, the more you can handle.
Yes. Yeah. Sorry, one second.
There was some stuff that we're going to have to cut out of the video. Okay, here we go. Can you repeat your question, please?
Mm-hmm. That depends on the model. It depends how big the model is. And it depends, well, actually, those are the two parameters. It depends on the size of the model at that point.
It is not reaching out to the internet at any point, unless you build in some capability for the model to go and search and then bring the context back and send it back to its own answers. Yes.
Okay, so here, and this is how I wanted to finish, actually, the bit, and this is nothing to do now with executing on local models, but how many of you are familiar with Suno? Okay, you've got a few people here. Not that many, but Suno just came out with song generation, so we're gonna take the lyrics And what style of like say that we want, we just created this, um, uh, the lyrics that we just came out.
What type of, what style of music would we want this to play to? Sorry. Okay. Hard rock church style.
Okay. Mind stone in a church. Okay, the right audio is generating there.
Okay, so now we have a locally created MindStone song that is, now Suno is in the cloud, to be clear. This is a website, so this is not executing on my own machine. It's gonna take a few more seconds.
Okay, so let's see what comes out. Yes. A lot of you. Well, I'll stop there for a second.
To your question earlier, as you saw, I only told it that MindStone was hosting the meetup. It, in the lyrics, put that, was it, a platform for learning that's renowned. Somehow, in the actual model, it already knows that MindStone is about learning, which I'm surprised by, because this is a fairly small model, and we've not been around for that long.
So, sorry? No, there's nothing on the network, like literally. OK, I will do this again.
I will do this. I will switch off my Wi-Fi. There is no internet on this machine.
I'm going to delete this chat and start in it. So I have to chip, PHY2, that's what we were using. So let's ask any question.
What do you want to ask? So what time? Solar on Monday?
It won't know that because it doesn't have the concept of time, nor does it have, well, it might know. Like I'm gonna say, what time will the next solar eclipse be? Okay, the next solar eclipse occurring directly across the Earth's surface will be total solar eclipse on December 10, 2021.
This is because the knowledge cutoff is there. But you can see none of this is using the internet. That's the beauty of what I was showing here.
The entire thing is executing on my machine. And it's absolutely crazy that this makes sense. It's actually spitting back things that we know and it's using just this tiny laptop.
Well, that's all that I wanted to show today. Hopefully, it shows you just how easy it is to get this done. If you ever wanted to use private data for any of these, now you can.
There you go. Thank you very much.