How to Build Reliable AI Systems

Introduction

Hi everyone, my name is Dani Fulla and I'm the CEO and co-founder of Flowhawk.

Who we are and what we do

Now just to put you into context, Flowhawk is a technology partner that helps businesses scale by building customized solutions.

So we focus on scaling operations of this business and today I'm going to talk about some design principles that we follow to make sure that our programs, our systems do not break whenever our clients are using it.

Now, how many of you are technical, have a bit of knowledge of code?

Nice, OK.

Talk goals and audience

So actually, I only have 15 minutes, and I have one main objective.

Actually, two.

Value for technical listeners

For those of you that are technical, hopefully by the end of the talk, you will get at least one key insight that helps you build a more reliable system and makes you feel confident whenever you are building these AI automations or products.

Value for non-technical listeners

And then for those of you that are non-technical,

The idea is that you still get a better understanding of what it actually takes or what is actually behind the scenes of an AI automation.

Agenda

So this is the outline that we are going to follow today.

Live demo setup: AI-powered booking via SMS

First of all, I'm going to walk you through a live demo of an AI system that automates a booking process.

And then we're going to go through the different design principles that we follow at Flowhawk to build these systems, or at least some of them.

So now

Do you see it properly?

Maybe better like this?

Yeah, okay.

Live demo walk-through

Platform and scenario

So here what you see is a platform that is called JustCall.

And basically it provides you a phone number and an API.

Basically an API is a program that allows you to process messages being sent to this phone number

do something and then reply back.

And here what you're going to see is the simulation between a client that is going to be me and a photography service that it's providing photographers for events.

And what I'm going to do is I'm going to request the service and then you'll see the interaction live.

And the system, what it's going to do is it's going to collect all the necessary details to proceed with the booking.

Once it has all the details, it will confirm the booking with me to make sure that all the details are correct.

And throughout the process, I will ask him a tricky question that he's not going to be able to reply.

And you'll see how this AI system will escalate the request to a human.

So it will pause the interaction.

It will send me a message as a human, and I'm going to be able to take care of this request.

So here, I'm going to have to be reloading because it's not live.

Conversation starts

I just started the interaction, and what you see is, this is the first message that I said.

Hi, I want to book a service for next Saturday.

Okay, now the system automatically replied this.

Okay, I'm this PhotoPro AI system.

Please provide all these details.

So you can imagine how the flow is going to go.

I will have to provide all these details until I have provided all of them.

Okay, so I'm going to do this now.

I have here, for example, let's do this.

And this is happening through SMS.

So I'm sending SMS to this phone number.

And it's actually, OK, now it replied.

Let's see.

So this is my second message.

So I'm saying, this is my name, an email.

This is the time of the event, the address, et cetera.

And now the system will realize that, OK, it's so cool that you have provided me this information, but you still need to send me all these details to proceed.

Exploring service options and completing details

So now I'm going to ask, OK, what kind of services do you offer?

Let's refresh.

This is my message.

Now it's taking some time to process.

It's here.

And now this AI system has sent me the different services that this imaginary business provides.

So for example, we're going to say that we want to book this one, the basic photography package.

Let's see.

OK, and I'm also going to provide the remaining information, like number of photographers, event duration.

OK, this is my message.

Now it's processing again.

OK, here we are.

Confirmation and system handoff

So now it's asking me, hey, I have all this information.

Is it good?

Do you want me to proceed?

And let's say, OK.

Okay, I'm going to confirm.

All right, and now the automation has replied that it has been confirmed.

And actually, if I go to my Telegram, so the system is connected to my Telegram, but you can imagine, it could be connected to any CRM or ERP, and it's going to

One second.

Yeah.

So here it has sent me some information about the booking details, the confirmation itself.

Edge case: Escalation to a human

Now, if we go back to the automation, now I'm going to ask him or it a question.

Like, OK, imagine, can I pay with cash?

It's a typical question that maybe a client might have.

And in this case, we want to make sure that the automation or the AI system doesn't reply things that it doesn't know.

So now if I refresh, you will see that, first of all, you see that now the conversation is escalated.

You see the stack.

So the system has been able to flag this conversation as something has happened.

So now a human can come back and take care of it.

And also, if I go back to the telegram,

the system has informed me that something has gone wrong.

Okay, client is asking if payments can be made with cash.

So now I should, me as a human, well, the person behind the business should go back to the conversation and take care of it.

Reality check: Reliability and limitations

Now, many of you might be thinking, okay, this is so cool, but I'm pretty sure that if I grab your phone and I start texting, I will break it.

And it's true, actually.

This automation, it's not reliable, at least for now.

And actually, if you paid close attention, it made a mistake.

Because when I started the conversation, I said that I wanted to book the service for Saturday, but it has confirmed for Tuesday.

Now, this is fine.

It's fine that the automation, as of now, it's not there.

But the thing is,

Integration is only 40% of the job

Building this system and integrating it with a CRM, Telegram tool, it's just 40% of the job.

The remaining 60% of the efforts is about tweaking, gradually improving the automation of the system so it is able to handle all potential edge cases of this business.

Why AI systems are hard to stabilize

And this is, it's easier said, but it's so difficult because in AI, as you all know, it's non-deterministic.

And if you have played a bit with the LLMs, you know that if you try to change the prompt a little bit, the model can behave extremely different as before.

Design principles for reliable AI systems

Now, so now the question is, how can you manage to build a system that you can

iteratively improve without breaking whatever was working before.

And now I'm going to talk or show you some principles that we follow at Flowhawk.

So here we have the infrastructure of the system.

As of now, you only have to know that we have two types of parts.

We have a set of agents.

And then we have deterministic code, so normal code that every time is executed the same.

Principle 1: Single-responsibility agents

1So the first design principle that we follow is that we try to isolate each task and have one agent to be responsible for this specific task.

And we have found that this is way more powerful rather than trying to have this super powerful agent that tries to do everything.

And there are two reasons for this.

First of all, by reducing the responsibility that each agent has, you are reducing the risk or you are reducing the probability that this agent makes a mistake.

And second and most importantly, by having this isolation,

Observability to isolate failures

you can do very powerful things because you can connect your system to an observability platform that is called, but here I have an example of one.

This is called LogFire.

It's just an observability platform that allows me to see what each agent has done for each SMS that I've sent.

So for example,

Debugging a real issue: Wrong booking date

Here I can see, let's try to fix now the issue that we saw, okay, that the automation tried to book for Tuesday instead of Saturday.

So how do I have a way to quickly identify what's the issue and fix it without breaking other things?

So here what I'm going to do, if I go to the start of the conversation, here in log file I can see, well, I can see the prompt of my agent and I can see the input that in this case is going to be

the conversation.

So here we see that when the agent that is responsible for extracting the information of the conversation, he actually thought that the event date was the 23rd, whereas Saturday is the 20th.

And actually here, I know that the problem is with this guy.

It's not with the replied one, the one that creates the response.

It's not with the one that filters the messages.

This is the guilty guy.

So now we can do further investigation.

And actually, if we look at the prompt, this is the prompt that I'm using for this session to extract all information.

What I see is that I'm giving him the date.

Okay, and this date is correct actually.

But the agent doesn't know that Saturday it's like three days from today.

So maybe now, if I would be the guy that has to fix this, I would say, okay, maybe what I have to do is that I have to tell him the date, like the day of the week, like today is Wednesday.

And maybe like this, he will be able to know that Saturday it's three days more than 17, so it's 20.

Well, actually, it will not work anyways because what happens is that the system is not smart enough.

When to upgrade the model vs. improve prompts

So here in this case, you have to change the model and try to find a model that is more capable.

But this has to be the last thing that you do.

First of all, you have to work with the prompts.

And only when you have hit this obstacle, this big wall that you cannot go by, this is when you try to upgrade the model.

And then let's go to our final design principle here.

Principle 2: Prefer deterministic code first

And it's deterministic code.

So at Flowhook, we try to solve any problem with deterministic code.

And only whenever it's not possible to solve it with deterministic code, this is when we put AI in the place.

Why this?

Because deterministic code behaves always like this.

It's deterministic.

I know that for every input, it's going to work.

In our case, this is the most important part of the system.

So this code is actually responsible for taking the information that the AI agent, the second AI agent, has given us to put it in our database.

to create a client, create a service, and also identify whether there's missing information.

And if there's missing information, this code is going to prompt the third agent in a different way.

So if we are missing the name, this prompt is going to tell the agent, hey, we need the name, ask it.

On the other hand, if we have all the information in our database,

The code is going to tell the agent, hey, we have all the information.

Confirm the details.

And actually, I can show it to you quickly here.

Dynamic prompting driven by system state

So if we go to the first interactions, for example, here in this interaction, I still need to provide the event type, number of photographers, whatever.

So in the prompt, have a look at the end.

I'm dynamically providing this information to the agent.

I'm telling him, hey, I'm missing all this.

So you have to ask for it.

On the other hand, you go to the end of the conversation.

So here is when the agent confirmed the request.

We have a look at the prompt.

we see that now this last part has changed a bit.

I'm telling the agent, okay, this service is now ready to post.

These, all the details, please confirm them with the user.

All right?

Principle 3: Test-driven development for stability

And now, I will just give you one last insight.

For those of you that are in this grayish area that you are not a professional coder, but you are starting to code with AI,

It's super powerful when you have this deterministic code and you have what is called test-driven development.

Basically, you define multiple, you describe multiple scenarios and you make sure that whenever you have

coded this functionality, it will always stay like that, even if you try to modify code in the future.

And if in the future you modify code, and the scenario that you describe no longer works, you will immediately know.

So here, what you can do is you can build your unit tests manually, and then you can use AI to generate the code for you.

And this way, you're able to know if what the AI is doing is breaking your system or not.

And now just to finish,

Key takeaway and wrap-up

If I want to give you one key insight here is that the best AI systems use the least amount of AI.

So consider it whenever you are building or designing new systems.

Thank you.

Resources and how to connect

If you want to download the code, you can scan this QR.

You can run it on your terminal.

And actually, you don't need to be registering the Just Code platform.

So you will find the guide.

And if you want to connect, feel free to connect with me on LinkedIn.

I'm always happy to chat and see your use cases.