How we built a working Legal AI product in 4 months

Introduction

Hi, everybody. I'm Daniel. I'm the CEO at Draftpilot.

It's a company me and my co-founder founded about three months ago. Before that, we ran another company that you may have heard of called Lexu, which was also in the legal space, which we sold early March to Colexius.

And Draftpilot is a tool for lawyers. Just a quick show of hands, are there any lawyers in the room? One. Okay, all right, that's good for me to know.

What is Draftpilot?

I will explain first what problem we solve, and then we're gonna go into a quick demo, then I'm gonna show you how our internal testing setup looks like, and I'll give you one insight that we learned about how we increased our quality of the output about 3x, and at the same time, same change, the speed, about 2x.

The Problem We Solve

So what problem do we solve? We are helping in-house lawyers. So in-house lawyers are lawyers who work at a company. They're not at a law firm, they're working inside a company, a corporate, and they basically help the company with a whole bunch of legal stuff.

You know, the company launches a new product, the in-house lawyer will advise, oh, are there any legal issues with that? They wanna buy another company, the in-house lawyer will help hire external lawyers and manage that relationship. There's one component of their job that they also do, about a third of the time, which is reviewing and negotiating commercial contracts. And it's particularly painful for them when they get a commercial contract that is not on their own template.

Challenges for In-House Lawyers

Let's think of a corporate. The marketing team wants to buy a digital whiteboarding tool. They're quite excited about using that.

Now, when a small business buys any SaaS tool, they just hit accept terms and get going. A corporate can't do that. So the lawyer has to get the Word document out open and basically goes through it with a fine tooth comb and changes everything in the contract that the company's positions don't align with.

So they all have a bit of a checklist in their head as to what things they do accept and don't accept. And they do that in track changes in Word. Now that seems like probably a nonsense problem to you, but it's a big problem to them because it takes so much time.

It takes about two and a half to three hours to complete one of these markups. And it's not glamorous work. It's not the kind of work that gets you promoted. It's not the kind of work that gets you a better job.

It's kind of like, ah, I still have to review these two contracts. I'm going to be home late again. So it's a nice entry point for a new legal AI tool because it means that the customers are actually quite eager to solve it. And that's why we're not offering it to too many law firms because I think it would help them too, but they have to figure out their business model first because I guess removing two hours from a job isn't amazing news for their revenue.

The Competitive Edge

All right, so here's one question that I thought I'd just very quickly answer. So what's the moat? Every VC is probably thinking, well, what's the moat around this?

We're using LLMs on the backend. And what I've come to learn in these last three to four months is it's very hard to have a moat around being very clever technically. I think if I started this business two years ago, I would have needed to raise 10 to $20 million. I would have needed to hire like 10 machine learning engineers to have a shot

at creating something that we have now, but it would take us a year or two to build, and it would probably be half as good. But that's not the world we live in anymore. Me and my co-founder, just the two of us, we haven't made any hires yet. We're able to build what I'm about to show you in three to four months without

you know, being machine learning experts. So if it's not the technical prowess, and then the next one up, a lot of our competitors in the space who have raised money, they said, and this was probably true about a year ago, well, we have a way to get lots of data, lots of contracts, so we'll fine tune the LLM, and as a result, our LLM will be higher quality. I don't know if that's true anymore. So you can't rely on that one.

And I think the Bloomberg example where they spend something like $10 million fine tuning GPT-3, which was better than GPT-3 normally, but then GPT-4 out of the box outperformed it. I think that's going to be true. So I feel I can't rely on a data argument.

1So what's left is basically focusing on making it fit exactly into the workflow of the user. So it's old school startup skills, UX, UI, and doing lots of customer development and understanding exactly how the customers, the users, will want to use the tool and what they're ready to buy and what they perhaps aren't ready to buy yet.

Demonstration of Draftpilot

So without further ado, I will now show you how it works.

Integration with OfficeJS API

So because we need to work where the lawyers work, we have the pleasurable task of figuring out the OfficeJS API, which is a pain because that wasn't designed for what we're doing. But the lawyers work in Word, so we have to work in Word.

So what a normal lawyer would do is they would put their little track changes on, and they would read this contract, which is about 13 pages, and they would go through and change all these little bits. What our tool does is it has enabled the lawyer to set a little checklist. And this just says things that they need to look out for.

Like, well, if we're buying something, then we should at least have 30 days payment terms to buy. Like if, whoops, just disappeared. If it's seven days, we can't do it.

And that checklist they can generate with AI and then tweak themselves. So we have a little checklist which we can use AI to generate. If we're happy with it, we can just go in and edit anything we like to make it align with what we as in-house lawyers typically look for.

The AI Review Process

And then all we do is hit AI review. And it's now doing a number of things behind the scenes. And I can show you what it exactly is doing.

But it's going through the contract. It's linking each of these entries, these checklist entries, to the right parts of the contract. And that's useful because as a lawyer, I might want to jump through all these things and very quickly navigate to the most relevant parts of the contract.

But it's also marked red any entry that has an issue. So if we go back to that same fees and payment one, here are the positions, and it's basically saying, well, there's three issues with this. The payment terms are seven instead of 30 days. There's late charges at 4% instead of one, and it allows to suspend the services if the invoices aren't paid.

I can just jump to the right clause as well to double check if I agree with this, and then I can generate a solution. And what it'll now do is it'll grab the right part of this clause that you're looking at, and it's going to redraft it so that it aligns with the playbook position. And then it's going to run a track changes so that I can see what it's suggesting to change. So it's nearly there.

There we go. So I can now jump to clause, and it's suggesting to change this clause. There it is. like that.

Empowering Lawyers with AI

And I can now edit this. So we've designed a tool for the lawyer to still be in control. And that's quite important.

When I started out with this, I thought, we'll just upload the document and you get it back all marked up. No lawyer wants to buy that because they're like, ooh, that feels super scary. Give me an Iron Man suit where I feel super powerful rather than an autonomous lawyer robot who's going to do my job for me. So that was a big lesson that we need to enable the lawyer to make critical decisions, even if they don't make any.

So you can see it came up with a nice drafting suggestion and if I like it, I just hit apply markup and it's gonna go in and put it in and track changes. So as a lawyer, I now no longer have to do any drafting. I didn't even need to find the issue and I certainly didn't need to solve it.

What lawyers like to do is tell the other side, look, I needed to change this. I'm sorry. I hope you can accept this. So they put in these little comment bubbles. So the AI has also created that. And they can just hit Insert. If they like the look of it, they could have tweaked it.

And that's how they can just go through each change and just hit Apply. And it'll insert it and do the drafting. hit the comma bubble, and I've timed myself, and it took me about 10 minutes to do the contract to the same standard. Instead of the two and a half hours, it would have taken me manually.

So that's basically how the tool works.

Product Development Insights

A few other things we learned. Sometimes you're reviewing a contract and you're like, I don't like this clause, but it's not in my playbook. So it didn't come up.

Addressing Specific User Needs

So it's another example where we had to think, well, what does the user actually want? 1And because when you create a tool like this, you've built so many of the components and they're stuck in different prompts that we've sequenced, it's actually quite easy to lift one out and build another little tool around that. So if I show you as an example,

There's a liability cap in here somewhere. There we go. So this is one-sided. So it basically means the vendor's liability is capped, but my liability as a customer isn't.

If I don't like that, and I didn't have a playbook entry around that, I can just select it, hit redraft selection, and say make this mutual. And now it'll just go through, and it's basically using a lot of the same prompts that we had written for the main product to fix it in this way. And then you can see, OK, that's now mutual.

I just hit Apply and insert the comment. So we've got lots of little things like this, including I need a clause drafted from scratch, or I want to generate ways to push back on the other side. So it's lots of little bits.

Every annoying task that a lawyer has to do in this workflow, we basically map it and figure out how can we make use of the LLMs to help them do that. So that's how the product works. And I'm pretty amazed that it works as well as it does.

The Impact of GPT-4.0

And I don't think I can take credit for it. I can take credit for designing it in a way that it's quite user-friendly. But the real magic is actually happening at GPT-4.0, because that's the model we're using. And that baffles me. And it also scares me a little bit.

When we started out, GPT-4.0 wasn't there. I had a meeting with my co-founder, which is, this needs to be faster, because it took a minute and a half. So we came up with lots of good ideas, good founders, good ideas. Let's have a two-week sprint and nail these things.

Next day, GPT-4.0 came out. Our goal was to cut it in half. It cut it in half. Now, we still built the other things as well because we wanted it then to be even quicker, but it's pretty crazy.

It's this moment of joy when the underlying model improves so much that your product improves, but it's also a little bit scary because normally when you add a lot of value yourself as founders, it means that you feel more secure. I don't think with this business I'll rest easy. I think we're going to have to keep moving really, really fast to adopt the latest and greatest that comes out in terms of models so that we're always up to date. Because that's the other thing. When a new model launches, a lot of your old prompts don't work in exactly the same way that we thought. So we had to do heavy, heavy testing with every model upgrade.

Learning from AI Peculiarities

What we learned is... that the AI is both genius and dumb in very unpredictable ways.

So how we used to build software in our previous business was in a more linear way. So we were always thinking, what does it need to do? Lots of if-then statements, lots of logic, and the AI doesn't work like that.

Building a Testing Playground

So we needed to almost brute force experiment with small changes in the prompt to see does that give us a change in output, which meant that I'm the non-technical person. My co-founder is the CTO. He needed to build me basically a playground where I could play in a way that I didn't have to bug him for every experiment and where I didn't have to go to ChatGPT's website because I needed to make use of some database stuff. So this is what he built me.

We have a group with different bits of AI and then bits of sup things. So that big product I showed you where it goes through and lists out all the errors in red, and then the issues, and then you fix the issues, those are these four steps. Clause splitter, clause finder, issues generator, and solution generator. But we didn't start with these four steps.

We started out with one step, which was here's the whole contract, here's the whole checklist, tell me what's wrong. And we were well within the context window. So I thought this is going to be super easy peasy and scary because anybody can then do that. But even though we were well within the context window, it only got 20% correct. And it was a random 20%. So it would sometimes find these five issues and the other time these five issues, but not all of them.

So we ultimately needed to start splitting it up in ways that made sense. So our first step is to split the contract in relevant chunks. So if you open up a contract in Word, the paragraph spacing might be very arbitrary, very random. So we're asking the LLM to tell us what are the main themes in this contract? What are the main headings? And can we put a little circle around those? So we relabel the contract with these headings.

Then the clause finder is basically saying for each entry, each checklist entry that we looked at, what are the relevant parts? Because there might be like five parts that are more or less relevant. And tell us which those five are. Then in our database, we have these nice groupings between one entry and the relevant parts of the contract. And then we used to, because we didn't think it through, send all of that in one big prompt out, and it would go through sequentially. The one major change that we made was to do that split and then send it out in parallel because we were adding steps, but we made more than good on that by then sending those 20 checklist items all in separate API calls in parallel, and all of a sudden we gained a whole bunch of time. And the accuracy, went through the roof.

So that's the thing I learned is that this context window is really useful because it enabled us to do that first step of chucking in a big contract and getting GPT to see the whole thing and label the headings. But then when we ask it to do something more clever, it's actually not so good. So then we need to go smaller again, even though the context window is big. And then when we have just that playbook entry, those three bullet points and two clauses, all of a sudden it's actually really good at finding out what's wrong and how to fix it. So that's what I spend my time doing.

You can see here, this is like the super prompt, which is not official GPT language, and then the normal prompt. And I basically go through and edit this prompt and then hit go, and I can change the models here. So that's how I can very quickly check every single part of the entire prompt chain on different models. So when I encounter something that is a wrong output, I go in forensically and figure out which step did the AI do something wrong? Which part of the prompt chain was that? Does a different model do it better? And can I improve on the prompt to fix this issue? And that's basically, I would spend maybe 100 iterations a day on different steps to get it to work as it did now.

Which sounds like a lot of work, but then on the other hand, like three months later, you have a product and you're two people, which again, wouldn't have been possible two years ago. So that's how this works. And then my co-founder also built a little staging app that's also in Word, which is the other one here. And the benefit of that is that I can test brand new sequences end-to-end. So if I go real quick here... to this one. You can see this is the same playbook, but I'm testing something now where it finds the issue and the solution in one go. Because we got some user feedback last week that it's kind of annoying. You see an issue, you want to fix it. Now you're like waiting for it to do its job. You probably prefer to wait an extra 30 seconds up front, but then you can go through in one go. So that's what it's doing now. The same fees and payment one, boom. We've got the markup immediately. So that's the change that I can then test, and then I run this through like 50 contracts to see is it better, and then we move the staging one to the production app. I think I'm probably nearly out of time. Let me check if there was anything else I wanted to cover. I covered this, I covered this.

Conclusion and Invitations

And then, yeah, the final thing is stay in touch. If you think it's fun to hear, because I share quite openly about what's working, what's not working, just go to our website, draftpilot.ai, hit join wait list, and you'll get my updates on what we're doing. Or come find me on LinkedIn. That's it, thanks everybody.

Finished reading?