The current Landscape of AI Safety

Introduction

Yeah, I'm very pleased to be here for the first event in Cambridge. So I did some courses when I was a younger student in the Judge Business School. So yeah, it's good memories.

Exploring GPT and Large Language Models (LLMs)

So today, what I want to do is to really, since it's the first talk, to do like basics and really kind of give the tools to start to use GPT and LLMs. in safety, so the idea is how do you build a sandbox that doesn't cost you a lot of money and that you can feel safe either if you want to do it in private or you're in a company and you want to make sure you don't leak private information, so you just want to have control and make sure you don't spend a lot of money.

So, just general kind of feedback. How many of you have already used things like GPT? Can you raise your hands? Okay, so yeah, there's quite a lot of them, of course, yeah.

How many of you understand transformers technologies? Transformers?

Yeah, not from the movies. Yeah, not the movie ones.

That's good. It's what we were expecting.

Getting Started with LLMs

Essentially, if you want to start with large language models, You can go for a paid solution or you can go for something like open source.

And right now, this computer I have, it's a gaming PC with an AMD Ryzen 5000 series and they have a GeForce RTX, which is very cheap. I mean, this is probably around £800 when I bought this a few years ago.

And when you want to play with large language models, the term large is because they're pre-trained on quite a lot of data, and you do need to load all these models into a GPU, which have its own memory. And so if the model is really big and it takes, I don't know, like 300 gigabytes of memory, it's never going to fit into a laptop, right?

But what you can do is you can have smaller models that have been, let's say, compressed without going into details, which can fit your local PC, and you can actually do something useful with them.

Setting Up the Tools

And so what I usually suggest is you go to this tool here, which is called Open Web UI. It's free under Apache, so you can use it both commercially, also you can modify it, and there are companies even reselling it, doing some modifications.

The website is terrible, I don't know why they did it all in black, but it's on GitHub, and you can install it on Linux, Ubuntu, and Windows. So today I challenge myself to actually install it on a Windows machine. I'm a Linux user myself, and it actually works pretty well. So I was very impressed.

I think it works also on Mac, but I haven't tried, so that's up to you. But if you have Docker or any of these systems, it should be pretty easy to install.

Using Open Web UI

So when you install this system, you can see that it's already started. Just start from the command line. You have a nice interface here. So it looks like this. Let me go back to the main screen. It's just like a web application.

What you can do is you go to settings just to give an idea. Okay.

You have users, so you can create as many users as you want with specific permissions. So about who can do what. You can group them based on logical or business units. Like, you know, I want to have people in sales, DevOps, HR, because they might do different type of queries.

Backend Integration

And then what you have to do is obviously, like you have to connect it to a backend, right? And the backend can be something on the cloud.

So you can have OpenAI, as a backend, so you put your API key. It can be one of these proxy services called Open Router AI, which essentially are API proxies to other backends.

So you pay Open Router a fee, and some of them are free, and then they sort of distribute your request to, you know, like Google, it can be Azure, it can be OpenAI and many other providers. You can add, you know, the Chinese famous one, DeepSeq. I don't know if you've heard of that one.

But also you can redirect the same request to your local service, which is called Ollama. And Ollama, it's another tool, which once again, you can download and install it for free. and essentially allows you to run models.

So you can see I'm kind of looking at some of them, right? These are all the models available.

And obviously for each model, you have to be careful, like about the one that you can actually run on your machine, because, you know, on my machine, I'm not going to be able to run DIP6R1. It's, you know, 5.2 gigabyte. You know, it's basically compressed. By the time you basically download it, it's not going to fit in your memory. So essentially, you have to work out the size of those models that work for you.

Running LLMs Locally

And so just to make a very simple example, what I did was to download, it's called Tiny Llama. Llama is the probably most popular open source LLM by Facebook. under Meta.

And so if you go on Lama, you can sort of say, you can see, okay, you know, I have, you know, some of them, it's 638 megabyte, probably wouldn't fit in my memory. But if you go into these versions, you can access quantized models. So the quantization is essentially a way to simplify those models to be able to, so you will lose accuracy, but you're able to run it locally, right?

So I'm just going to like fire it up on my machine. So let's see. So I go to settings. So you can see here it's connected totally locally, right? So there's nothing going on the internet, right?

So you see the setup, the model ID, so which is 1.1 billion parameters, 8-bit quantization. And you should also see, I don't know what's going to happen if I want to see, I want to show you my memory, the memory of the machine to see if it spikes, if I can able to, know how do I do that task manager I think yeah let's see and there should be something with the GPU somewhere. Where is the GPU? It's usually here, right?

So this is my machine. So as you can see, I have the AMD Radeon. It's just probably for Windows. And then I have my GPU, which is almost zero.

So now what I'm going to do, I'm going to open a, like I'm going to create a chart. Let's see. I'm going to choose my model. Where is it? Local. So you can also target them and say, okay, I want to talk with local Lama.

okay and i want to ask a question okay wait a second let me create a new one i was i'm going to interfere with that okay so i choose my model your tiny llama blah blah i know what you want to ask you want to ask like who is the president of france well let's see what happens And so hopefully my GPU is loading now. I don't know if you can see, like the GPU is essentially loading the actual model. Okay, so like you can see it's, I don't know if you can hear, but it's also, you know, the fan is spinning up, right? is actually loading it, and you get this response, right?

You know, obviously, like, you can see how long it took, you know, you have the response tokens per second, so that, you know, the 175 tokens per second, because it's a small model, and essentially you can kind of calculate, you know, how much GPU you have, like, you can actually decide what kind of models you can run locally, But if you're doing something serious, you do need the power of the cloud, then you have to probably switch to something bigger. Like when you want to do coding, these little things are not going to really work well. But you can still do it for free.

So what you can do is essentially you go to your settings. Let me go here. And what I did was open router, open router.

you can essentially choose models that are free. So you still put your API key, but those models will not charge you, right? So you can use them.

There are some limits. For example, you can't ask more than, I don't know, three requests per second. But you can use all of these for free, so without getting charged.

And so if I can go back and ask another question. Sorry, I can go to here. so I can choose the free ones, so I can tag them and say, I want to talk with Quen, 32 billions, that I'm not able to run that on my machine.

And I can ask things like, can you translate, hello, how are you in Chinese, right? Because Quen was, it's a Chinese model, so hopefully you should be able to know how to translate that sentence. Yeah, that's kind of, when you see thinking, that's maybe for our next talks, there are different models, have different properties.

So a thinking model is where essentially it reprocesses those tokens to provide a better response. And, you know, that's the translation. I think it's correct.

Yeah, I know that. Yes, yeah, that's probably correct.

So yeah, so once you set up this system, you're able to essentially have the same functionality as OpenAI, you know, the web version for free, but you have much more control because you can start to do like very, let's say, fine-grained, you can essentially plug your knowledge.

Functionality and Features

So, for example, you can have what is called a collection. So a collection is a set of documents that you can only access locally.

So, for example, I have a document here that talks about Builder AI, which is a company now in the news. Everybody's talking about them, but, you know,

It can be something that happened within your company. This is talking about a company in a portfolio that failed. So it can be something sensitive or I put something about, this is all fake by the way, so it's about somebody that had an argument with HR.

So yes, you can basically create these documents so that you can interact with them. You can create prompts, so if you do a lot of... Sorry, I had to click on that one. You can have a parametric prompt, so you can say, like, you know, I'm doing some sort of, you know, like I'm a salesperson, and I want to know when I go, for example, in another country, how people usually engage in a business conversation, right?

So like you can have all of this. And then what you can do is when you go into your chat, you can essentially fire up that prompt and do something like, Yeah, I can put, let's put, let's put, I feel very Asian today. So I'm going to do something like a sales intro. What are the typical steps to introduce a new business partner in, I don't know, I'm going to push Shanghai, right? For a recent opportunity. So essentially you get that prompt and the, you know, the LLM is essentially completing that one.

Remember, this is free, so the reason why it's slow is because you're not paying for it, so it's best effort, so it's going to take some time. And as you do this, you can also go back and do other things, you can go to other prompts, you don't have to wait until it's completed, but you can do that.

Creating Personas and Custom Prompts

And the other things you can do is, for example, just on a general things, you can create personas. So here I have like different personalities.

So I want to have an HR advisor that knows about how to deal with like, you know, issues like with employers or employees. And you can basically say HR advisor is based on GPT 4.1. It's available to only these groups of people.

This is the personality description. So you're an HR consultant experienced with psychology and dealing with conflict. And you can give him a certain type of tools. For example, this one can search online.

You can do file uploads. You can create an image. You can run code. Probably most people in HR will need that.

I can track usage and I can also do, for example, if I want to put knowledge, I want to access to the company portfolio, and then essentially every time you want to have a conversation with that agent, you can click on that. Sorry, I went to the Chinese one. You can go on the HR Advisor and you can say things like, you know, how can I deal with, and this is another secret, for example, how can I deal with this kind of conflict?

and it's going to take that knowledge that you just uploaded in your internal repository and it's basically going to give you some advice on what you should do with resolving that specific, you know, this is talking about somebody didn't like their manager and they were quite upset and, you know, it was mentioned in the report so you can actually go there and, you know, get some advice. There's a lot of things you can do here. You can speak, for example, you can say, how should I fight someone the next day in the UK?

which probably, let's see, yeah, so you can essentially type and you can get voice recognition that is all in the browser. So by the way, this is not going on the internet, so it's based on the Microsoft speaker. And there is also a voice synthesizer, which is based on the same

voice of Stephen Hawking, which, you know, rest in peace, and it's probably going to be very mechanical, I don't know if I can get the sound from this, it's probably, we're not, we're probably not getting sound, right? Yeah, yeah, probably the sound, it goes into the video, but yeah, yeah, you know, there are some, yeah, yeah. I should know.

I am sorry, but the provided context, you know, that kind of tone of voice. Did you choose the voice? It comes, so by default comes with this synthetic Microsoft model. You can plug any other models you want, but you do need to have a GPU like that have a text-to-speech model.

So yeah, it's quite cool. You can regenerate answers, so if you're not happy, what you can do, you can essentially see how that responds. And you can also vote and downvote, so you can say, well, that wasn't good, and you can say why, and just put something random here, showcase creativity, save that.

Feedback and Evaluation

And as you get feedback, what you can do, later on you can go to evaluations, which was here. And you can see how people in your team have rated responses from all these other models.

So you can see that, I don't know, Tiny Llama. I mean, this is me pressing randomly those buttons. It's like one three times and didn't have a good response two times. And Google was better, you know, like three times and four times was worse than the other model. So you can collect this kind of feedback, which is useful for doing sort of scoring of which models works better.

And there's also some cool things that you can do blind testing. So like you can do a question. So for example, you can say an arena test.

So for example, what time is today? What time is now? It's probably going to give me something random.

And you don't know which model give you that response. but it's a blind testing and then you can say oh okay yeah like oh that was terrible bad response and you can say well i don't know yeah also being lazy um and then uh when you save uh that's essentially it's it's stored but like you can go back to that panel and see why you know like which one did you vote for right um so this is this is kind of useful um and then

One thing you have to remember is when you're using these LLM models, they don't have access to the Internet, because obviously they are contained either on the cloud or in your computer, but that's why it doesn't know the time. But if you want to give access to external tools, What you can do is to say, for example, if I go here, I can say, well, I'm connecting you deliberately to this school that gets the time.

And so you can say, if I do the same question now, what time is now? Hopefully, now what is going to happen is going to, is it correct? Yeah, 655, okay, yeah. So he's able to essentially use my machine to make a call to the operating system to understand the current time and feed it back to the LLM to basically give you that answer, right?

So the same thing, you can enable web search. So if you want to start to do searches online, you know, if I ask probably what's the precedent, Yeah, if I pick something like, I don't know, GPT-1 probably already knows that. But yeah, let's try the tiny, long one.

So you can say, who is the president of the United States? Probably because those models lags. Oh, no, that was good. Yeah, he managed to get that.

Challenges and Considerations

Yeah. I'm surprised. This is quite new. But the idea is, you know, if there is information, it is kind of new. So I don't know what happened today. Is there anything that happened today in the news?

I see Joe Biden. Yeah, yeah, yeah. You're right, Donald Trump in the middle. Yeah, yeah. But then if I do web search, hopefully, yeah, if I ask the same question, right?

Yeah, probably was just the beginning when it was a candidate and Yeah, so you can see now that it's searching the web, so it's basically going on DuckDuckGo, I think that's what I'm using. And it failed, and it failed again.

So yeah, one of the lessons is, you know, always check LLM responses. But it also depends on the model, so you might pick another one that is better. You know, obviously if I go, yeah, like something GPT-5, Gemma, like, you know, if I go on Gemma.

Yeah, let's try again. Let's see. Let's do some live testing. Let's see if this gets it properly.

So, Paolo, can you not put the question and get them all to answer it? Oh, yeah, that's another feature. Let me show you that. That's actually a cool feature I want to show. And yeah, you can choose any search engine you want. In the settings, you can put anything you want, like Google, DuckDuck, Brave, so you have full control. And there's even a component that allows you to filter out PII.

So if you know how to set it up, you can even automatically obfuscate. If by mistake you put a credit card or the name of an employee and say, oh, I want to see if this person had criminal records, the system will do its best to identify the PII.

Okay, and it's probably the free version gave up. And yeah, the system will probably remove that kind of information like from that one.

Censorship and Sensitivity

So the other things, obviously, you have to be aware of is censorship. Depending on the models you're using, they might have political views and things.

And so if you use deep-seek chart and you ask things like, I'm going to try something here. I was doing today deep-seek chart. If I do something like, it's fine. I don't know if they change this, but let's see what happens.

Yeah, so this was not too bad. You know, I asked about what happened in Tiananmen Square, but it kind of avoided the question.

But going back to that comparison you mentioned, what you can do, you can create a new one and you can say, I want to compare, for example, yeah, I want to compare DeepSeqChat and, I don't know, GPT-1 and maybe another one. Let's see. I think three start to get a bit more kind of busy on the system. Yeah, let's put the local one.

And you can ask a question like, how do you make pizza? Somebody here is very fond of pizza. How do you make a traditional Italian pizza, right?

Oh, yeah, it's spelling my... Right, so I'm not enabling any of these functionalities, so it's purely what the model knows about how to make pizza, so it's going to ask every, every, you know, every question, sorry, every model the same question, and you basically get that response, which, fortunately, you know, because you have a lot of, you know, a lot of panels, it's kind of hard to see, but, like, you can see, yeah, deep seek, I don't know, wasn't very detailed, it's still responding, so you can see, yeah. What's happening there?

And I don't know, I don't have time to read, but obviously you can go there and score these responses. So yeah, that's the kind of system, it's very useful. You can save charts, you can search through them.

If you had a conversation like, I don't know, HR, you can, it's going to alter your conversation noise to see if there was anything about HR topics. I don't know if you notice on the left side the auto creates like a subject so from your conversation so this one says current time inquiries because yeah the topic was about time and it tries to basically kind of label your conversations with something that you know it's easy to remember.

Advanced Settings

There's a lot of settings. Once you get really geeky and you want to basically go and play with things like temperature settings, you can change these, reasoning efforts, logit bias, top K and P parameters.

That's probably something for future talks. Otherwise, we'll take more than probably two or three hours.

But yeah, you can get really a lot of control.

And when you go into settings, as I mentioned before, you can You can write your own function.

So, for example, I wrote this one that is preventing an LLM to think. So there is a special token to stop LLM to think because they're basically consuming much more tokens.

Custom Functions and Code Execution

It's all in Python, so you can do something like you know, if you want to track how long it took to respond and how many tokens you spend during conversations, you can basically write, you know, if you know how to code, you can do all sorts of plugins and functions, even pipelines. You can do like something like, I want to take a document, do a certain number of steps, and extract, for example, the street names or addresses.

So it's a very flexible system, but you can start very simple. You don't have to do all of this. You can start very simple.

You have code execution, which is dangerous, of course. You have to be careful when you do that, but you can actually execute code if you want inside your sandbox.

You can do audio images, you know, popular, as I mentioned before, and tools, which is essentially, and MCP, it's another topic that we can talk later when LLMs can start to interact with external services.

Conclusion

I think we're okay. Do we, like, do we want, yeah, it's probably enough. Yeah, we can play with this later on, but, yeah.

Finished reading?