Welcome everybody to MindStone London, one of the biggest orgs here. Could you please raise your hand if you've been to a MindStone event before? Yeah, okay, that's an active community.
I don't know all of you yet, but I'm sure we've been to lots of these events together now.
You'll notice very quickly, I'm not Josh Wall. He is in Atlanta right now at a different MindStone event, because the community is growing and expanding, and you can't be in two places at once.
So I host MindStone's online events, which next we're gonna be on the 24th, if you want to attend that and do some networking with people around the world. But today we are going to do what we normally do.
So people who are new here, this is a MindStone practical AI meetup. Practical AI means using AI, not just building it. And we are the biggest practical AI community in the world specializing for non-technical people.
Josh founded this community because he wanted to get a sense initially if the US was ahead of Europe in Billionaire and actually found that it wasn't. But the problem was when he came to the UK, anytime he would talk about enthusiasm or what you can do moving things forward, he'd get asked, have you thought about privacy and have you thought about the end of the world?
And yes. Those things are tremendously important.
I have a background in the ethics of AI, so I absolutely believe that. But particularly in Britain and also in Europe, that's our default setting really is to be very risk averse and concerned about future technology.
And we should have a space that's a bit in the spirit of America to try and think what could go right and what could be really cool with this technology as well.
So that's what we're going to do here.
We really try to do as much demos as we can. And there's an understanding that the demo gods aren't always in our favor.
So if things don't go wrong on every running of a model, understand that's part of the process, I'll be presenting a work in progress in a second.
This QR code up here that should be presenting is the courses that MindStone runs to improve your AI competency, which again will cover prompt engineering and I think a limited amount of ethics as well.
You've also got here the Sandbox somewhere, and you've got the Pearson Accredited Certification.
And big thank you to sponsors for these events as well, who pay for the lovely drinks on the side, which I hope some of you are enjoying, and the pizzas later.
Pizza will arrive in about, you get to have at the end of the talk, so about an hour.
Terrific.
With that all out of the way, we can move to the first talk, which is me. Before, then we're going to have Aaron and Victor.
Okay, so could you raise your hand if you have been thinking at all about like either super intelligent AI systems in the near future or if you've had the question that like maybe AI is overhyped or a bubble? Okay, most people basically.
So I've been thinking about basically this a lot because there's this whole range of books, like it was Taming Silicon Valley, AI Snakehole and AI Con, arguing that basically AI is a bubble and overrated and these are stochastic parrots that aren't gonna have recursive learning.
But then you've got all these other people who are, you know, building the technology and very informed saying we could have super intelligent systems basically in the next year.
So I embarked on a project to try to turn this, you know, quite separate discourses into a single discourse and to transform this into a debate.
And the way I went about it is I started reviewing books. So I started with Silicon Valley called AIS Reviews AI Hyped Books.
And each time I've just reviewed a part of the book that argues maybe the AI is overhyped, and then a different book arguing that AI is underhyped. And what I'm trying to do here is turn this into a debate. I have a background in competitive debating, so that's the way I see and understand the world.
And what I'm gonna show you today is where I'm at in that project and the generative AI tools I'm using to help me. The biggest one is Notebook LM.
Can I get a show of hands if you've used Notebook LM before?
Terrific. It's the best ever.
I'm also going to show Claude's latest artifact feature and how it benchmarks Claude 4 against 3.7 for the same prompt. I imagine lots of people have tried Claude before.
And raise your hand if you've used Gamma before, one of the sponsors of MindStone.
Okay.
Not that many people. Okay.
So if you see Gamma presentation for the first time, that's going to be really cool.
Okay, so the first thing is I've taken all of these posts, and I've copied and pasted them into a Word doc, and I've taken all of that copy and pasted in a prompt, and I've given that to Notebook LM. And that's this little source here.
Now the cool thing about Notebook LM is you can then connect it to other sources as well, and you can have a notebook of various things that you can discuss and think about. The other source I've given it is the Timelines Forecast.
Has anyone heard of this paper, AI 2027? Yes, lots of people.
OK, that's the last time I'm going to make you raise your hands before I start.
Well, this is serious people who are being very serious about really intelligence explosion or recursive learning. And the way they imagine this happening is that AI systems will get so good at coding that they will be able to edit themselves in such fast order that there will be runaway intelligence in a very short matter of time.
And there is a... methodology for forecasting here that allows them to predict or make a prediction across the future when this might be most likely to happen.
What I've tried to do is match that and two of the papers they cite under it, one of which is a paper about different kinds of tasks in the economy that take different professionals different amounts of time, and then ranking them and charting how AI has progressed in those kinds of tasks. So one of the key features is the,
Time horizon, which means how long it takes a human being to do a task, is doubling with respect to AI capability every seven months.
So since 2019, a generative AI system could do a task that human could do in two seconds. And then seven months later, it was four seconds. Seven months later after that, it was eight seconds.
And now with stuff like Cloud4, we're approaching coding tasks that take the average code of something like 30 minutes. And if this trend continues, then you can imagine this continuing in that respect.
So what I've tried to do is I've tried to put this into Notebook LM, and I've given it a prompt. I think it was a second attempt because initially the response conflated parts of the book review that I'd given it. But eventually I gave it to make sure that it was just sourcing the discussion. I gave five points of clash, which are points of real disagreement in the books. I'm just gonna read two of them.
So point of clash two, accuracy and scope of AI automation forecasts. Anti-Hype from Emily M. Bender and Alex Hanna, the author of AIcon, analysis suggests widespread job motivation by generative are unrealistic and based on flawed methodology. Maybe I'll just skip to the bits and bold. We can't expect text synthesis machines to check if baking bread is done. That was one of the kinds of tasks that was predicted by, I think, the METR paper. to see if it could complete them. Actually, no, sorry, this was the off-net. This was O like star net paper. Or test electrical circuits, you know, how can you expect an LLM to be a form of electrician, to see if these things can be done. And they say these evaluations, in the form of exams, don't tell us much about real-world tasks. Basically, it's an argument that for these reasons, the predicted impact of AI on the economy might be overblown.
But then, very sadly, NobuckleM has also put the pro-hype case from Elie Lafflin, Nicola Djokovic, and other researchers, which is they show this trend. I was talking about this time horizon of human labor compared to AI performance seems to be doubling every seven months. And this represents an increase in the complexity of the work that AI can do, different AI systems can do. And for certain tasks, it can generate and test solutions much faster, and it's highly relevant to frontier R&D.
Second point of clash, so hopefully you get the idea here. Essentially, I'm taking arguments from one part of the discourse, trying to match with another, and trying to create some kind of valuation in between. As I should say, this is very much a work in progress, so very, very keen to discuss what people think about some of the arguments here, how they shape up to each other, because lots of evaluation, lots of improvement to the discourse still needs to be done.
Then you've got another pro-hype side, which is on METR, which is, again, all about cost effectiveness and economic viability. For one, eight-hour degree of effort from AI agent costs $123, only a small fraction of the approximately $1,855 we paid our human experts on average. I'll throw in here, like, This is true for some tasks, but there's lots of tasks where AI can be very, very expensive to complete them.
So I don't know if anyone's ever seen the ARC benchmark for AI performance, where you've got some orange squares in the middle, and there's a little two-bit orange square here, and you see in the next one that it's made a P shape, like it's extended. And we can all see that that's happened immediately. And to get basically LMs to recognize that kind of pattern, it ended up costing OpenAI like $300,000 worth of compute for O3 to get it to recognize that very basic thing. So we have what we like call a jagged frontier where AI still really, different AI systems will really, really struggle with lots of things we find really, really easy, but it can also really, really excel at lots of things we find really hard, right?
And... Yeah.
Then we've got the other side, which is Gary Marcus saying that expecting a smooth road to GPT-7 is gonna be really difficult to predict, and actually we need a new breakthrough in kind of architecture. And he talks about meaning of systems and symbolic AI, okay?
That wasn't the big demo of this talk. As I say, at MindStone, we like to really show demos, but thank you for coming along with me in something that I think is really important and affects us all.
But now I want to show, if you wanted to visualize this concept in a debate, in a mapping, and we'll give an AI system some leeway to be creative about how you might visualize that, what might it come up with and how might what it comes up with have improved over time? So initially I had Claude 3.7's artifact feature try to do this.
Now, Claude 3.7 was a remarkable model, very impressive for coding, but as you can see, it hasn't really got the arguments done very thoroughly, and these text boxes are no good at all, right? So I then had Claude 4 do it, and it's much, much nicer.
So I think this is one anecdotal, as these things often are, way to visualize the improvement in some of these systems.
If anyone's wondering why I'm not using DALI and the OpenAI image generator models to do this instead, because they're quite good at text generation, it's because those models are diffusion models, which means they're much more energy intensive than Clause here, which actually you can switch to see how it's actually just coded up this whole solution. which is a much, so it's actually what it's producing here is text. And it's the text that's, as a code solution, then created this artifact of this really cool map.
So that's, I think, one interesting anecdote.
And you can see the arguments I went through there are now represented here as speeches from like a prime minister versus a leader of opposition versus a deputy prime minister versus deputy leader of opposition on those two or five points of clash that I thought were especially interesting.
Um, so that's one cool way of visualizing this, but, um, one of the sponsors of, um, these events has an endlessly reliable demo set, which is basically the same prompts, but now we're going to generate a presentation to a similar, um, effect and we give it similar creative license. And yeah, there's just always one of the fun ways to view this.
Maybe we'll look for the purple one on the side. Yeah, this is a close to the mindset brand.
I'd point out I didn't delegate what it should be putting in which slide. It's got a pretty good sense for what the titles of a topic should be.
Great. So now we have a different way of presenting that information.
Um, obviously the drawback of this version is unlike notebook LM, I can't just click to the source of it. We'll have to manually check it against that. And that's why notebook LM is like the main tool I'm using through this project.
And if I want to create interesting visualizations that I either use called artifacts or I might use gamma and, um, hope to continue with it and keep reading more books.