Human in the Loop: Where AI Should Stop and People Should Lead

Amazing. Perfect. Thank you so much.

Introduction

Yeah, I just want to start off with, yeah, wish all of you a happy birthday.

I hope you get really, really drunk tonight. I think that's the main thing that we have to do.

What Purple Square Does

So my name is Balv, and I am the founder of a company called Purple Square.

And what we do is we help hospitality properties, and we help them to automate all the boring and mundane questions that guests have. So the simple way to do it's probably a digital concierge for luxury independent hotels and resorts so that's basically

How the Digital Concierge Works

what i do so if you're a guest in a property you scan a qr code that's in your room and you need to answer to a question rather than ringing a busy reception team or a front of house rather than going to a website mostly out mostly outdated website or a brochure scan the qr code take straight to our system and then type in your question and within five seconds it will answer

that question so it could be as simple as what time is breakfast tomorrow it could be what time's the spa open it could be anything along those lines local tourist attractions so anything like that we automate it giving the staff time back so they can provide the wow moments

Who This Talk Is For

that you have as a as a guest so this presentation is for people if you own a business you'll leave with a framework to to protect your ai strategy and your ai policy if you're building automations as i know some of you are i've spoken to a few of you so if you're building automations you will leave the way to stop those systems going brittle and sort of breaking down and for everyone

else in the room you'll leave with a lens for the next ai decision that you make so a bit of of everything little sprinkling for everyone in the room so this is more of a a theoretical talk than to ollie and to adrian as well and i'll tell you why in in just a second so where ai was a few

years ago it's where it is now here in april 2026 it's fast it's so different it's improving so so much like you said earlier alexa alexa plus was want for a better word shit and it was awful but But now it's getting better, and the technology is getting better, and the innovation is getting better as well.

Since we probably started this conversation, it's probably 10x. That's how quick we're moving in this space.

Why “Human in the Loop” Matters Now

The reason I chose Human in a Loop, there's a couple of reasons.

I was watching a documentary by a presenter, a really great presenter, called Dr. Hannah Fry.

If you haven't seen her, if you haven't seen her stuff, please, please, please go and see her.

A Cautionary Story: When AI Encourages Harm

and she did a documentary where she was telling one story where a person went to go and assassinate the queen when she was alive that's what transpired and that's what the news sort of broke and that's what they told what transpired after that is that behind the story this person was talking to an AI

girlfriend and this video just shows hannah dr hannah talking about broke into windsor castle with a crossbow trying to kill the queen um and that bit made the headlines but what people i think didn't know about because it didn't come to light till later was that there was an ai that he

had been talking to in the months leading up to the attack um that had encouraged him to to to act as an assassin and attempt to um to commit the sort of greatest act of treason possible

isn't that crazy that a human girlfriend that he was having a relationship with because it was agreeing with everything he was saying he was the al wasn't arguing back the ai was like yeah go and assassinate the queen um and he got caught on the grounds of windsor castle trying to do that um so that was absolutely incredible another reason is

Automation vs. Accountability

is autonomy and automation isn't automatically, try saying that really quickly, better. So it's not always better.

I know we always want to automate everything, but that's not always better. And the reason for that is because speed and scale goes up. But also risk and accountability, that goes up as well.

Yes, 100%, that's brilliant. So we need to measure both in equal measure. And I've got a picture of Waymo there.

Autonomous Vehicles as a Risk Example

If you haven't heard of Waymo, Waymo is a company that has autonomous cars in a few cities around the U .S. They are expanding to Asia, Europe. I think in a couple of years, it will be here in Bristol, maybe, London, definitely. And that's starting to come, sort of bleed through now.

And there was another story where Uber, they were trialing a self -driving cars. And they were trialing it, so it wasn't open to the public yet. And in March 2018, a Uber car accidentally ran over and killed a pedestrian it thought was a bicycle. And there's videos on that where you can see the person, there's a person in the car, and they're trying to stop it from happening.

So that was a case that was really, really scary. Gary.

A Real Incident From the Field

And for my side, for us here at Purple Square, I woke up to an email about six months ago from a general manager of a hotel in California. So because of the time difference, I woke up in the morning and he said, have you seen the chat logs from last night?

And I was like, no, I haven't seen the chat logs. So I went into the back end, checked the system, and there was a guest in his property using our system.

And instead of asking for for breakfast, instead of asking what time the spa was open, he was asking for, what's the best way to put this? A lady of the night. He was asking for a prostitute.

That's probably the best way to say it. And he was quite graphic in what he was saying. And I know it was a he, because he was talking about his penis.

And that's when I was like, bloody hell, this This is, we need to have, well, not need to have, but luckily our AI systems revoked all of it. It just kept sending them in a loop. It kept sending, can you clarify the question, which is probably the wrong question to ask, which is probably why he kept going on and on and on.

So, yeah, that was one of many cases. But luckily, our AI was safe. So we take, before that and after that event, we take our security very, very seriously.

When Humans Should Re-Enter the Process

humans should re -enter the process when there are five signals that we have in our company so the first one is material impact, uncertainty, novelty, ethical trade -offs and weak feedback loops so they're the five things if it doesn't understand it will hand off to a human

Who Controls AI—and What “Control” Means

big question who should control the AI and should it be regulated Should it be the AI, because they can now think for themselves, they've created forums, they've created chatbots, or should it be us as humans?

It's a good question to ask. And what does that control actually mean?

Guardrails, Permissions, Auditing, and Escalation Paths

For us here at Purple Square, it means guardrails, it means permissions, it means escalation paths that Ollie mentioned earlier, it means being able to audit, so audibility,

and aligning with our policies and our procedures as well, which are very stringent. and we make sure that they are stringent and we update them as and when we need to.

Customer Decisions: Thresholds and Exceptions

So an example of that, customer decisions. So refunds, credits, churn offers, that can be all AI and it can also recommend stuff as well.

Humans come in when it's approving thresholds and exceptions as well. So that is the control in a certain sense that we need to have.

Safe, Trusted, Scalable (Not Slower)

have so ai should amplify your decisions but humans still need to own the outcomes that's very very important and when i have done talks about humans in the loop a lot of comments and a lot of questions i get is that i'm trying to slow ai down that is not the case at all i want to put that to bed straight away what we're trying to do is we're trying to make it safe

we're trying to make it trusted and we're trying to make it scalable they're the three things that that we're trying to do. So I'm not trying to slow it down. And I'll go on to bottlenecks and stuff a little bit later

Where Full Autonomy Goes Wrong

on. An example of that is OpenClaw. So OpenClaw is a system that you can run on a Mac mini. And you give it all the permissions.

And we've had cases where it starts making fraudulent transactions on your bank account. It starts making reservations at restaurants that you don't even live at, or cities

that you don't even live at. So that's a really good example of it going too far. So open clause, a great point there. And these are just some stats that I think back me up, I think, in that case.

Why AI Hallucinates (and Why Humans Catch It)

AI is optimized for efficiency. So when you input something, AI will try and get to the quickest way to give you that output. Sometimes it gives you the wrong answer, but super, super confidently. It hallucinates.

And this is a really good example. So this is an older version of ChatGPT, so I think it's been fixed now. But ask it how many R's there are in cranberry.

Strawberry is another example as well. So if you asked it how many R's in strawberry, it'll say two. And that is the AI hallucinating, and that is a good way to have a human in the loop.

Full automation is not the end goal. When it's forced, it should usually assign a system wasn't designed well in the first place, and it's around real -world variability as well.

So that needs to make sure that happens. So this is what we do.

The Three-Layer “Human in the Loop” Framework

This is the framework that we have that I mentioned earlier. So we've got layer one, AI executes. We've got layer two, human reviews. And the third and final layer is human -owned.

So I will explain each layer.

Layer 1: AI Executes (Throughput Engine)

So the first one is what we call the throughput engine. It's best for repetitive, high -volume questions that a person, or a guest in our case, has. The key points for that is optimised for speed and consistency

and not authority. So that is layer one, where it's really good.

Layer 2: Human Reviews (Risk Filter)

Layer two, this is where human reviews. So we describe this as a risk filter.

So it's best for edge cases, exceptions, low confidence, so when it doesn't quite know the answer, that's a good way for a human to come in, and policy -sensitive calls as well.

1so the human intervenes when confidence drops or when the impact rises and the third and final

Layer 3: Human-Owned (Accountability Layer)

layer is where human owns and we describe this as the accountability layer best for final decisions trade -offs ethical ethical responsibility and legal responsibility as well and it hugely impacts the customer as well so the ownership never travels it the people remain accountable to be accountable for all the outcomes in layer three.

So for each one, layer one, where AI executes, example for us is we take care of all the boring and mundane questions that the guests might have in the property.

So that is probably the simplest way to do it. So that's level one.

Level two, we don't need humans in the loop at this process. We need humans at the moments where mistakes are costly, irreversible, irreversible and reputational as well.

So if it's going to cause reputational harm to your company, bring in a letter to whether human reviews.

Checkpoints and Escalation Design

And it's super important at this juncture to define what a checkpoint is. I think there's massive variability, there's massive ambiguity in what a checkpoint is.

What we call it is a deliberate pause where the system either It either asks for approval, routes to a reviewer, or acquires a second validation step before it continues on its workflow or on its journey.

And those checkpoints don't have to be slow. So we talk about AI being... When we say that I'm trying to slow your AI down, those checkpoints don't have to be slow.

Tiered Review: Low, Medium, High Risk

You can use a tiered review. So for low risk, it could be auto approval. For medium risk, it could be fast review. And for high risk, it could be senior approval for the system.

And automation is the routine, escalate the risky. So that's why we have that policy.

Avoiding Loops: Defining When Escalation Happens

And the third and final one is everyone needs a shared definition of when, that's super important, when the escalation happens, not just how it happens. 1It protects the user from getting stuck in loops. And we've all used chatbots before where you just keep going round and going round in circles and it's not giving you an answer. You just want to talk to a human.

So that's when a good place for the escalation to happen to a human. And the triggers that happen should be measurable. So either a confidence score, a risk score,

policy, strategy that comes into place, or it's required info that is missing.

Full automation, as I mentioned earlier, just automate everything. Just do that. It's easy, isn't it, right?

Why Not to Fully Automate Everything

But I advocate for not doing that at all. I think going in steps, make sure the first layer is right before you go on to the next layer. It's like building a house. You wouldn't build a house on some sand.

You make sure the foundation is solid, you make sure it's sound, and then you build on top of that. It's exactly like that. And again, these stats show that as well. So 60%, 6 out of 10, that's a good stat there.

So what does it look like before? So automation without guardrails. And what it looks like after is we have automation with intentional human review moments.

So humans can catch the drift and clarify intent and correct errors early. That's super important, before they cascade. And this graph is brilliant for that, before and after you have escalations with your AI.

So just to wrap up, that's that framework before that we use, and we find it very sturdy. And we rely on that.

What You Can Do Tomorrow (Three Questions)

But what can you do tomorrow? What can you do when you wake up tomorrow? You can ask yourself three questions. That's what we do.

So what boundaries are you proposing? So what's the boundaries you're proposing? So define the boundary types.

Second one is how humans stay accountable in the workflow. So ownership is super important. So who owns outcomes?

How are decisions logged? logged? And what does escalation look like when the confidence is low?

And the third or final one is, what do you want the audience to do next?

So once that goes through humans, is there an email that gets sent out? Is there a rating?

What do you want the guest to do, in our case, after the human has gone into the loop and after you've used the AI system?

Conclusion

So like I said, less automation, but better boundaries around automation.

Thank you so much.