Building an MCP debugger - the good, the bad and the weird

Introduction

Hello, everybody. Welcome. Hello.

Who We Are and What We Build

So first of all, I wanted to do a quick introduction to Undo and then we'll get on with the main billing. So Undo.

We make time travel debuggers for Linux.

They're mostly used by C and C++ developers. We support also Java, Rust, Kotlin, Scala, Go, et cetera, et cetera, et cetera.

Time-Travel Debugging in a Nutshell

And the key trick we do is that we can record everything a piece of software does as it does it, and then we can save that back into a portable recording file. So effectively, it's like CCTV for your code, and we've captured everything the program did.

And the result of that is that you can load it up later, after your program's exploded or whatever, and you can review what happened, just like with CCTV or a flight recorder.

The other thing you can do though is, you can go backwards.

So most of the time when we're debugging code, it's actually a process of trying to figure out, well, wait, why is this happening to me at all? Why is the program doing this? Why did we get here?

And time travel gives you the answer because effectively in software, the answer for why am I here is, well, because of the thing that happened just before now. That's what we're offering.

From Debugging to Agentic AI

And in particular, recently, we have been building AI-enhanced debuggers. So I'm here to talk about agentic debugging, debugging via a coding agent, and how you can use AI to investigate and why time travel is an important part of that story.

So thank you all for coming. I'm going to dive into the details now.

Building the MCP Debugger: The Good, the Bad, and the Weird

Building the MCP Debugger, the good, the bad, and the weird.

It has been a really peculiar few months trying to get these AIs to do things we want.

Speaker Introduction

To introduce myself in particular first, I'm Mark Williamson, I'm the CTO at Undo, and mostly I pretend to be a programmer. I don't actually get a chance to write much code these days, but I do get to swan around to places like this and talk about coding. Usually, I ask who else is a programmer, but I think we, well, we had ACCU people, I guess.

Hands up for programmers generally. You've played with programming or you're interested in it. Great.

Lots of my colleagues put their hands up, which is kind of good news.

Agents, Tools, and Modern AI: A Primer

I've introduced us and our time travel debugging and now AI, and I'm going to get into what that all means, but I also want to teach you a little bit about how modern AI is going and how coding agents, other agents, are using external tools to interact with the world and how you build those and what the benefits are.

What Is an LLM?

So, first of all, I just want to come back to what is an LLM, a large language model, really? The things that we're all calling AI these days. I would say it's kind of a brain in a jar.

They can't do much on their own. What's going on in there is some kind of We call it artificial intelligence or inference capability.

But what I guess we know is they've got something they can do with language. You can stick words in, different words that are clever come out. And they seem to be able to reason about some fairly complex problems.

So they're being used all over the world. They're being used for programming a lot in particular, which is how I came to be involved.

But fundamentally, it's a brain in a jar. Words go in, words come out. Not actually very useful on its own. We have to give it some ability to interact with the outside world.

Chatbots: The First Breakout App

So the first application where these really took off with ChatGPT for an open AI a few years ago was Chatbots, in which we can ask the LLM questions and it's structured like a conversation.

So what's going on here when you go to the ChatGPT website is you type into your browser or into some local client program maybe on your phone even and get sent off to the internet and get sent off to a computer you don't own but OpenAI or someone like that probably does and it runs on their large language model and their large language model's job is to figure out given all of the conversation you had up until now what is a good response to what you said?

So you might, for instance, ask it for some career advice in your life, and it might tell you it's a great idea. Or if you've given it more context, it might give you a more nuanced answer.

And that's one of the key things here that These things have a kind of, LLMs are trained on basically a pile of data from the internet, so they know quite a lot of stuff. I saw a article online calling them the lossy encyclopedia today, and that's about right.

But about the problem that's at work right now, they only know what you've told them, the context you've been able to get into their working memory, which we also call the context window. The more you can get in there that's relevant, the better it'll be.

Now, the brain in the jar isn't so useful. We've talked about how you can talk to the brain in the jar. That's quite useful, and that was the first big kind of success with ChatGPT.

Tool Use: Giving the Brain a Body

But your robot's much better with a body, something that gives it the ability to interact with the outside world. Because that opens it up from a case where you are asking the LLM questions and it's telling you what to do.

Do you ask the LLM to achieve things? And it can actually achieve some of it for itself.

So this is what we call tool use. And I've kind of summarized a little example. I've riffed on it slightly, but from Anthropix website, which has got loads of good information.

So say I'm a user and I've got an AI client running on my machine. and I want write a poem about the current weather in Cambridge.

Back in the early chat GPT days, what would then happen is it would go to the cloud and the LLM would hallucinate some weather you might be having, and then it would write a poem about that. And it wouldn't bear any resemblance to what was outside your window.

When we introduce tool use into the equation, the AI client can present to the LLM in the cloud some abilities it can offer. So in this case, say, well, I can compile code and I can tell you what the weather is. And it sends that up along with the request from the user. Write a poem about the current weather in Cambridge.

And actually some funny stuff goes on here because although you think this has all gone to the cloud and then an answer comes back, actually a little chat happens between the brain in the cloud and the agent on your machine, your AI client. So it says, what's the weather in Cambridge? Because it knows there's a tool for doing that.

Comes back, the weather in Cambridge is sunny. says, oh, good, here's a poem for the user. And then it comes back with something like, there once was a sun over Cambridge so rare the locals all froze in a confused stare.

They forgot how to act. It's a meteorological fact that sunshine in England's like finding a bear.

The way you get a limerick like that, by the way, is you ask claude.ai to write you a limerick. And then you decide it's not funny enough and ask it to be funnier. And then it forgets about syllables and just jams everything in.

So, tool use. This gives fundamentally the AI the ability to interact with outside things, which is really useful.

Enter MCP: A Plug-in Interface for AI Tools

Next, I'm going to talk about the model context protocol. So MCP, it was in the title of the slides. And this is a better way of getting tool use out into the world.

So what I'm talking about, not this guy. If you watched the 80s Disney's Tron film, this is the MCP.

He's a very bad guy. We don't like him. We want someone more like this.

Model context protocol is basically a plug-in interface for giving tools to AI. There's lots of complicated analogies used. Anthropic likes to call it a USB-C port for AI, which I heard and I went, well, eh? But it's a plug-in interface.

It's like browser plug-ins or anything else. You're adding additional functionality in.

Now, this little guy came when I asked Google Gemini to give me a robot with some little plug-in tools for his robot arms. And what you can see is I've got a little robot.

He's got some cute little arms. And then a lot of the tools have cute little arms of their own.

Why MCP Matters

But just to show how this differs, when you're using an MCP structured system, everything is kind of abstracted away. All the complicated things with, oh, we're going into the agent, we're going back up to the cloud, and then we're coming back, are abstracted away.

And people like us at Undo, who want to expose a clever thing we've built to an AI, just get to make something down here that plugs into your AI client, whatever it is, and it will just magically get given to the LLM. We don't need to worry about how, we don't need to worry about which LLM.

That's why it's been so popular. It lets people build AI applications very easily and it lets them be very powerful because you can interact with the rest of the world.

What Is an AI Agent?

Now, when you're interacting with the rest of the world, and when you start giving these AIs the ability to act on your behalf, that is an agent. So an agent is, I mean technically, actually we're all agents as well. We're intelligent agents, we have thoughts, we go around in the world, we try and change things according to some kind of agenda.

When you're talking about an AI agent or an LLM-based agent, we're talking about basically one of these cloud brains, the LLM itself, hooked up to some code that runs on your computer or on a machine of interest that runs what we call the agentic loop. And that basically goes through thinking and acting.

So they think, they sense, somehow they get information from the world around them, they make changes and they're trying to accomplish a goal, which you've given them, generally.

The Agentic Loop Explained

So in Agent Smith's case, the machine intelligence has instructed him to eliminate the threat posed by Neo to the Matrix, and thus the ongoing harmony of the machine race. And Agent Smith might say something like, that sounds like a great idea, combining a futuristic dystopia with a fun plan, I'll investigate. But what he's actually then running is effectively this algorithm.

So is Neo alive still? If so, I haven't finished my mission yet. So I do some planning. That's my LLM call.

I use tools on Neo, such as bullet time, guns, and scowling. And then I report, like the result of the tool use gets reported back to Agent Smith. He gets to decide what to do next.

What is Neo alive? Unfortunately, still yes. So the agentic loop goes on.

Agentic Smith.

Sub-agents and Composition

And by the way, the other reason I like this picture, sub-agents. I'm not gonna get into sub-agents in detail, you can ask me later, but effectively, you get a sub-agent when you make a tool, like we just discussed, out of another agent.

So then, like, your agent is a middle manager. You're asking for things to be achieved, and it's effectively, it's got a tool of little AIs it can manage for you now.

It's cool stuff. And there's actually lots of technical reasons why sometimes they are the right tool for the job, sometimes they are not.

Designing an Agentic Debugger

Brings me on to the topic of this talk, which is how we built an agentic debugger. And I don't have time to go into all the detail, but what I want to do is illustrate to you.

So we started with...

Why Debugging Needs Time Travel

Debugging, like what a programmer does is spend all your time on conspiracy theories, trying to figure out why did my code do this? Oh, I don't know. It's lying to me now. Why is it doing that?

1You can help with that by getting your programmer some time travel debugging, because time travel lets them understand the whys, not just the whats, and it lets them unwind and replay state until they understand what went wrong. And then you can also have an AI that's empowered by some kind of body, which in this case is going to look very like a software debugger.

And what you get is a time traveling cyber intelligence, a learning computer. So, I want to show you what this thing actually looks like.

Live Demo: Doom with a Time-Traveling AI Debugger

Now, let's pop out of here for a moment, and I'm going to drop into a terminal. So, I'm doing a new demo that I've not shown before at any talk.

So you should be able to see a little screenshot of Doom on the top right, which I was playing earlier when I should have been preparing. And you can see a terminal interface behind it, which is a basic interface to our debugger. By the way, thank you to Magna and Bert in particular for putting this demo together and showing me how to use it. It's really cool.

What I've done is I've recorded the software, Doom, whilst I played very badly through part of the first level in the shareware version. And so what we've got here, this is a screen dump of the current screen now in that session that I recorded.

And if I play through, we should be able to see actually how the screen changed as I was playing. So I'm messing around in the menus and going around, I'm shooting some zombies.

Yeah, good. I'm not great at this to be honest, but I do try.

At this point I got confused and actually went in the map and I couldn't get out. I hit all the keys, I got out, I shot some more guys and then I got there and I decided I'd done enough.

Explaining Program Behavior via MCP

What's actually happening here is our time travel debugger is reconstructing as we go every single state that the game went through as I was playing it so that we can analyze what happened. So what I can say now is I can use our explain command and I can say, tell me things about why this program happened. It then exposes this via MCP back up to an LLM and a coding agent, which can diagnose problems in the code without me having to think.

But also, I have given access to this to an LLM.

So what I'm going to say on this one is explain when was the second zombie killed? I'll say in this game, just in case it thinks there's one in the real world.

Letting the Agent Investigate

So this is firing up clawed code in the background, and it's basically handing the controls of our debugger over to it. It's letting it remote control and do an investigation. And what we'll see is it time travels to the end of recorded history, and then it begins investigating, poking around in the code base.

It's allowed to read the source code itself. It's allowed to read our record of what happened to try and understand. So there's quite a lot of interesting things can go on here. This particular demo can go in all sorts of directions, which makes it quite interesting to show people.

So it's having a little poke around. One of the things you notice is it's first had to translate my idea of, I don't know, a zombie into something in the game that might correspond to a zombie. They're not called zombies in Doom, they're called possessed or shotgun man.

And now what it's doing is it's rewinding through time and you can see as it stops at different points in time to inspect the state, you're seeing progressively earlier stages in my Doom demo. So, you know, this is my playthrough earlier today. It's poking around, it's occasionally, see it's tinkering with things. Sometimes it uses the tools we've given it wrong and it figures it out, reads the feedback and tries it differently.

Finding Signals and Self-Correction

Now what you can see is it's walking backwards in time and it's found this variable called kill count, which sounds like it's gonna help a lot. And it's used its inference to figure that out.

One of the things that's interesting, though, is that you see it taking wrong turns. And as a human, you want to go, no, don't do that, especially when you're demoing it. But actually, part of the trick to these agent systems is allowing them to, well, like children, I suppose, to make their own mistakes and self-correct.

So what we're seeing now is it's going off in different directions. It's trying things it thinks might work. It's trying to build a model in its working context window of how the program behaves. And in a minute, I think it might decide it's done the job. Well, it has a chew on that.

I'm going to pop back to the presentation because there's some design thoughts that I'd like to tease out.

What We Learned: Rules of Thumb

So when we were building this system, we discovered, well, we didn't discover, we researched online, we read other people's wisdom, and yeah, we found a few of these through our own misfortune.

But we came to a few rules of thumb. And this is kind of, this is the final conclusion of this talk, I think, in terms of how would you build a debugger using MCP.

Embrace Mistakes and Rewind

You've got to make it possible for the AI to make mistakes because you cannot guarantee it's going to do the right thing in all cases. In fact, you don't want it to always do the same thing. You want it to explore. That's the benefit it's giving you over writing your own program.

The way we're doing it is because we have this time travel facility, if it goes off track, it can always rewind to an earlier state that it understood, try again. And you can actually, you could see that happening earlier.

Fewer, Simpler Tools Work Best

1The next thing, which is kind of counterintuitive, is you have to provide as few tools as possible. Seems like giving it more would be better, it's more power, but I tell you what,

These guys are really enthusiastic users of tools, which is great, except that they are also quite good at confusing themselves. So ideally what you need to select is the smallest vocabulary you can that will allow it to solve the problem.

Otherwise they tend to get a bit hung up on their favourite tool and keep going bam bam bam bam bam, that's my favourite hammer, and trying to use that to solve everything.

The tools themselves have actually, they've got to do one thing well. That's surprisingly important.

Make Tools Stateless and Single-Purpose

So a lot of tools that are designed for humans have state in them. So for instance, in our debugger, you can set a breakpoint and now the debugger will just behave differently. As long as that breakpoint's there, you'll stop at a particular line of code every time you hit it.

We found the LLMs really don't like that. They'd set a breakpoint, so they're basically telling the tool, stop every time this happens. And at some point later, they'd stop there again.

They go, why have I stopped here? It's gone wrong. I don't understand it.

It's much better to encapsulate these things and give it joined up things like run to the next line of code, run to the last time this thing was achieved.

Work With the Model’s Quirks

And then the last thing is sometimes the AI just has weird opinions. And actually, these things are temperamental. You have to work with the grain of the system.

So if it refuses to use the tool you told it to do or refuses to use it right, in the end, you've got to change the tool. But you can spend a lot of effort trying to make it do what you want.

Demo Outcome and Takeaways

So if we pop back to this, what have we got? An MT possessed zombie man was killed at game ticks, 755 ticks. 21 seconds in, that sounds about right. Level time, yep.

And this is the screenshot of when I managed to kill the second zombie. So it's, I know it's done. We'll have a quick scroll back.

But you can see it's printed out, it's queried a load of information to ground itself in the real world and understand what the program really did. It's jumped about in history, recovering from different errors in its investigation or following up leads. It's watched the changes to memory and it's rewound through them to isolate the parts of the code that are interesting.

Conclusion

And that is your MCP-enabled debugger.