Time Travel + AI: Agentic Debugging?

Introduction

Hello.

This is a just-in-time talk delivery.

I'm going to talk about time travel and AI and maybe agentic debugging.

Presenter's Background

So first of all, I'm going to talk about who the heck I am and why I'm here. So my name's Mark Williamson.

I'm CTO at Undo, which is a local company making time travel debuggers. I've been at Undo for about 11 years and every year or two the company feels completely different and exciting compared to last year, so that's why I'm still there.

My main part of my job is to pretend to be a programmer, so I started off actually being one and gradually spent more time writing Excel sheets and presentations until this is my job now.

Programming Background

Out of interest, who else in the room is some kind of programmer? Yeah, I thought there'd be a lot of you, although I brought some of them.

So my home turf, and I guess mostly our home turf as a company, is C and C++ programming, but we do stuff in Java, Go, Python.

We're quite broad, and actually a lot of the problems that programmers have are common to any language. They manifest different ways, but they're the same kind of thing.

The Concept of Time Travel Debugging

What we do is a thing called time travel debugging. And we've been trying to persuade the world that it's a great idea whose time has come for some years. And we are, I guess, saving one soul at a time.

Recently, we've come to a realization that what we've made for humans is actually really interesting for robots as well. So we've started doing R&D on AI products.

And now I've got to back up what that all means before I run out of time.

Daily Life of a Programmer

So what do programmers do all day? Well, kind of this, I'd say. There's an understanding of what a programmer does. They sit in a darkened room, they wear a hoodie and they type.

And when they finished, there's lots of code and they run it and they've hacked the mainframe and they're in. That's broadly what they do. Except that bit is about 10 or 20% of your time as a programmer.

Ignoring the time you spend in meetings, et cetera, et cetera, actually a lot of your time is spent like this guy going, why did that happen? You write the code. Your colleagues write the code.

Everybody thinks they know what it's going to do and why. And then it does something completely unexpected. Nobody knows how you got there, why it happened.

So you spend your time in kind of conspiracy board mode, trying to go, well, okay, so all the things I expected happened, but something has crept in and perturbed the state of my system. And now that's snowballed out of control and it's not doing what I intended at all.

Focus on Productivity

So it's interesting because

We focus a lot on how productive programmers are in terms of lines of code. And if you notice, AI has moved heavily into the realm of development tooling and programmer assistance. And we've got these things called programming or coding agents now.

A coding agent is kind of like a programmer, but it's a robot. And in some ways it's very smart, in some ways it's not smart at all, but you can kind of see it as a software developer. And people are very excited because they can write code for you now.

But remember, that's not what most of the time is spent as a programmer. So you see people saying, well, 50% of my code is written by AI. So what? Most of your programming time isn't spent writing code.

Time Travel for Humans and AI

First of all, I'm going to talk about what we do for the humans, and then I'll come on to the robots. Basically, what I'm saying is being a programmer is a terrible job and you spend all your time not understanding what you've just done.

Our solution to that is time travel. What is time travel? Well, for us, our product, our core kind of engine is what we call a time travel debugger.

And what that means is it helps you solve your bugs, understand your code by letting you time travel through the history of what your program did. So you're not limited now to what messages plopped out or how did it crash at the end. You can actually rewind and you can uncrash the code and see what was happening in the moments that led up to the crash.

And you can actually then trace all the way back, no matter what inputs came into the program, no matter what randomness happened, and you can diagnose where did things start to go wrong.

Demonstrating Time Travel Debugger

So I'm just going to briefly show you what this looks like. Apologies, I'm only set up to demo this on a terminal, so a terminal you will see.

I'm going to run a crashy program. Working at a debugger company, we are very good at writing crashy programs and then solving them.

So I'm going to just switch to a dashboard layout here. Now, if you're familiar with software debuggers at all, this is my command prompt. This is where the source would be if we had any. This is local variables. This is the stack trace. And this is our perception of where we are in recorded history.

So a time travel debugger is like a CCTV system. You get a recording of all the events that happened. And then if you don't know, say, why your bicycle was missing at the end of the CCTV footage, you can rewind and try and understand how that happened. So we're going to do the same thing here.

What I'm going to do first is if you look at the call stack, the backtrace here, I'm going to uncall functions. So we're going to run back to before each function was called. And you're going to see that stack trace get shorter as we unwind the error.

And eventually we end up in the actual code that's got the bug. So you can now see we're here. We're at an assertion failure in our code. We tried to do some maths and it's given us a bad value and everything's gone horribly wrong so the program's aborted.

And what you'll be able to see, keep an eye on these numbers in the top right. I am now going to step backwards in this code. and you'll see we are moving one line at a time back and all of the state in the program is reverting. So gradually you'll see these variable assignments undone as we pass back through this loop and now we're in a previous iteration of the loop.

And ultimately you can see how you can follow this through as long as you need to until you find the bug.

Good. If that wasn't clear, I can show you in more detail later.

Integrating AI with Time Travel Debugging

But to keep pressing on, what happens when you combine time travel with AI? So it can be very effective is the answer.

As we kind of alluded to occasionally earlier and you're probably all familiar with, AIs are amazing except for when they're really not. They have, modern AIs, large language model-based AIs have a context window, which is kind of their working memory. And if there's good stuff in there, stuff that's relevant to the problem, they can be crazy smart. If there's misleading stuff or stuff that they need to understand the problem is missing, they hallucinate.

So, and it's interesting, what I like to think of it as is the training process these LLMs go through, is teaching them what a good answer to your question looks like. And a good answer looks like it's phrased respectfully, it's not overly sycophantic, it's using the right terminology for your domain of expertise, it's got lots of detail, and it sounds really, really plausible, and optionally, it's also correct. But they're happy to discard that if they don't know the right answer. It's still a good answer according to their training and they're gonna give you that.

The thing with time travel debugging though is that, Well, for one thing, as a tooling provider, software development tools have never been cool before. Suddenly they are because robots. The other thing, though, is that we've got this unique information, which is that we recorded everything software did.

And AI is a smart if and only if you get the right context into them. So if you've got a recording of everything the program ever did, you should be able to get all the right context in. And if you've got a bug, the AI should be able to solve it for you.

AI-Driven Debugging Demonstration

What I'm gonna show you then is a very early rough prototype of what we've added. So I'm gonna go back to console mode. and we are going to ask it to explain. We're gonna say explain pretty generically what went wrong in this program.

By the way, this is a live demo of AI and people don't tend to do that because it can do something unexpected. We'll see what happens. But I'm also gonna say, please explain in the style of A Trey Taurus Grand Vizier, who is secretly plotting to replace me. And we'll see how this debugging session goes.

So in the background, we're firing up Claude Code, which is a coding agent, so one of the software-based developers. And we come seeking answers about this troubled program. So I'm gonna talk you through what it's doing. I'm afraid it's gonna be a bit quick.

But first of all, we jumped to the end of history because we know that the end of history is where the crash came out. And then we start tracing backwards to reason about the problem and ultimately get back to where things went wrong.

We've got back to the place where the program crashed in the main function and now we're going to see what vile values led to the catastrophe by querying the output. So I haven't told it before this anything about the nature of the crash. It's deriving all of this from the history that we recorded and were able to navigate. And what it's now doing is it said, okay, I see a bad value came out of this function called cache calculate.

Just trying to kind of calculate square roots efficiently by caching things. And as we know, caching is one of the hard problems in computer science. One of the two hard problems along with naming and off by one errors. What it's doing is it's actually working backwards into that function, and it's found that a cached value was corrupt.

And that's kind of, that's a nasty discovery as a programmer, eyes narrow with suspicion, because if you've got a corrupt cache, then you know where your bad value came from, except you don't, because it actually came from whoever put it in the cache, which was sometime earlier. So what it does is it actually queries our engine. It says, OK, when did this value last change? Because if I know that, I know where the bug was, maybe. So it's done that. It's created this bookmark for me to look at later called corruption moment. I've sometimes seen it use quite spicy bookmark names for the actual bookmarks, depending on how in character it's feeling.

We cackle with wicked delight. So the magnificent betrayal is where we have, there's a couple of things going on here. So we've calculated the square root of minus one, which is naughty. And then when crazy, crazy values came out, we also put them in a cache entry that was too small. So it's been integer truncated. So what you see in this program is that a lot of the information you needed in order to solve the bug has actually been destroyed in the course of getting to the crash. But with time travel and the help of our treacherous grand vizier, we've been able, I believe, to get back all the way.

So I have to always check the output when I do these presentations in case it's hallucinated. But the advantage of time travel debugging is it doesn't really need to. It's got the answers.

Challenges and Solutions

So this looks pretty good. It's an off by one error in a loop. It's square rooting minus one. And it's integer truncation. Smirks behind hand.

So.

Conclusion

This is what we're working on. There's actually lots more I could say, but I don't have time. Please do come and speak to me afterwards or any of us.

The Future of Debugging

I guess what we're aiming for in the world is there's this guy, Brian Kernighan, one of the original creators of C and Unix. He said debugging is twice as hard as writing the code in the first place. So if you're super clever writing, you won't be able to debug it.

Now you are.

Thank you very much.

Finished reading?