How To Build Better Product Using AI - George Seif - Toronto

Hey, everybody. Thank you so much for attending, and good evening. My name is George. I work in the machine learning team at ADA, and I'm here to tell you about how to build better products using AI. Now, before I jump into the three concepts that I'm going to be sharing with you today, I want to introduce ADA to you because I'm going to be going through some examples that, from our experience, are going to help illustrate some of these concepts.

Ada is a leader in AI-powered customer service. Our core product is an AI agent that automatically resolves the most customer service inquiries with the least effort. This leads to an extraordinary customer service experience. Let me show you. Here in our example, we have our friend Grant, and Grant has just come onto a website and he's opened up that dreaded text chat bubble.

And immediately, our AI assistant says, welcome back, Grant. How can I help you? Grant says, where's my order? He wants to know where his order is. And the AI assistant immediately responds with, can I have your order number? Grant gives his order number.

And immediately, the AI agent responds, telling Grant that his order is going to be there, delivered between 1 and 3 p.m. And on top of that, it also tells Grant that he's got some loyalty points built up that he could save some money on his next delivery and his next order.

Notice a few things about this. There was no waiting on hold, which I'm sure all of us here have experienced, and it's not a fun time at all. There was personalized service. It knew exactly who Grant was immediately. There was value delivered fast. So Grant didn't start talking to this chat bot, and then he sends in his order number, and the bot says something like, hey, let me check with my manager, or they go look up some information, and then he gets put on hold for a few more minutes.

And lastly, it goes above and beyond what's expected. All Grant was asking for was when his order would be delivered. But this AI agent was able to deliver him even more value, telling him about his loyalty points. So that's where ADA comes from, and that's what we believe in, that we can make customer service extraordinary with AI. Now,

The way we're going to do that and the way you can do that is through these three concepts here that I'm going to walk us through. I'm going to go through each one in turn, and in doing so, share some of our experiences and some examples from ADA. And along the way, I'll be generalizing these concepts to how you can apply it to your product and business. First, set and measure AI's impact.

You can think of AI or LLMs in general as this kind of magic wand you've been given. It's extremely powerful. You can cast a lot of spells and you can do a whole lot of things with it. But it's really, really, really expensive to apply, especially if you don't know what you're doing. That's why it's really critically important to be able to set metrics that are aligned with the value you're going to be delivering to your customers.

If you can set those metrics, then you'll know whatever you build or even before you build it, when you're in that design phase or in that ideation phase, that's what you're going to be applying AI to and these features you're going to build that AI is powering are actually going to be worth building and are going to deliver value at the end of the day. The first thing you want to set is a kind of high-level metric that's going to set that position on the map that you know you want to get to.

At ADA, we call this automated resolution. An automated resolution is essentially a fully automated interaction between a customer and a business. If we can increase this number of automated resolutions, this number of conversations that we're automatically resolving where a customer is typing in or they're calling in,

then we know that we're delivering more value, we're automating those conversations, and not only that, we're delivering more extraordinary customer service experiences. And so that's the first thing you want to set is this kind of long-term North Star position on your map such that whatever you build or even whatever you think you're building, you'll know that it's actually going to deliver value at the end of the day and get you to where you need to go. But that's not all.

Let's go back to our example with our friend Grant. We take a look at this conversation, and at first glance, we can say that it was resolved. There was no human involvement here. Grant got the information he wanted, and it even went above and beyond. Or was it? Let's think about this again. Let's say that somewhere in the back, there was a huge snowstorm, and Grant's delivery got delayed by three days.

But this snowstorm happened a couple of hours before Grant here typed into the chat.

In this case, again, if we just look at this text, it looks like it was automated and it looks like Grant got the information. But in actuality, this AI agent gave him incorrect info. So there's also some more detailed metrics that you want to set so that you know, as you're on your way to achieving that goal of optimizing that sort of long-term metric and getting to that far off position on the map, that you're getting there in the right way.

And so for that, at Ada, we set a few detailed kind of metrics. First thing is accuracy. We want to know that our AI agents delivered accurate information to the customer that's called in or typed in, meaning that it was factual and correct and up to date. So in this case, it should have told the grant, hey, your package is actually going to be delivered by two or three days because it was delayed.

Second thing is we want the conversation to be relevant in the sense that if Grant types in and he asks, hey, where's my order? The AI agent should not respond with, well, here's how you can order things on our website. The responses should always be extremely relevant to Grant's issue and to help him figure those out. And the last thing is safety. If you think about having this kind of AI agent or any kind of AI-powered feature

in your product, it's really a representation of your business in the sense of how would a human represent your business. And you want that to be of the highest quality and the highest standard. That means that the bot should interact fairly.

It should interact professionally in an unbiased manner, and it should not veer off into topics that are totally irrelevant to the business. So if Grant types in and he asks for investment advice, or he asks for something really bad like robbing a bank or something like that, then we definitely do not want our AI agent to engage with that. And so by setting these metrics in terms of that far off high level metric, you know where you want to go and what you want to optimize for.

And by setting those more detailed metrics, you can capture those nuances and make sure you're getting there in the right way. The next thing is to align AI with humans. If we think about all of those metrics that I just explained, they're really quantitative measures of how well you're doing and how much value you're delivering. But there's also qualitative aspects, qualitative behaviors that you would want your AI-powered feature to have.

And this ensures that your AI feature in our case with Ada and AI agent that's responding to customers is going to act in a certain kind of qualitative way that you might be able to describe with words, but not exactly measure. For example, at Ada we have something called human guidance. And these are essentially free text descriptions that somebody can write to describe how they want their AI agent to behave.

For example, we see here in this little graphic, don't direct customer to points dashboard. Provide the customer with their current balance and the relevant offers available. If you think about this, this is a far more streamlined experience than asking the user to go all the way over to a dashboard. It's just saying, give them their information directly if you have access to it. And in that way, it's a higher quality experience. Again, something like that would be kind of hard to measure in terms of metrics.

But in this way, we can describe what we wanted to do because we know what that behavior would sound like or look like if we were actually delivering that experience.

On the other side of the spectrum, you can not only write a kind of loose text description, but there might be strict and well-defined rules that you want to set. And at Ada, we call these policies. So for example, you might want to say that you would only redirect a customer or tell them about the premium features or data if they're a VIP customer. Because if they're a basic customer, they might not have paid to get access to that information.

Or you might only want to upsell them or tell them about the sales offer if it's actually Black Friday weekend. If it's not, then we don't want to be sharing that information and kind of throw them off course. So in a sense, we have different kinds of behaviors we can describe all across the spectrum, whether in free text or more like strict rules.

And in that way, we can align our AI and the generations it's creating to be aligned with the highest skilled and most experienced human in a way that they would do it if they were to do it manually. The last point, improve AI continuously. Ever since this really big splash of chat GPT and in general, as Josh said,

real rising popularity of generative AI and 2023 really being the year of generative AI. We've all very clearly seen that AI is an incredibly powerful technology and it can do a lot of amazing things. But a question starts to creep up into our mind. If AI is really powerful and it's becoming easier and easier to use as we've seen with the accessibility of chat GPT and other things,

and it's only getting easier and it's only getting more powerful, how do we create a moat for ourselves? How do we create a defensible AI system if everyone can use this and it's so easy and it's so powerful? And that's where improving AI continuously will come in. What you want to do is set up an automated and continuous flywheel such that you're deploying your system

You're constantly learning and your system is constantly learning from the data that is flowing through it. And it's constantly improving its performance and increasing the value it delivers to your customers. What's happening here is that the first party data that is flowing through your system from your customers, you're using it to improve your system and the value it's delivering to them. That's data that's not flowing through your competitor's systems.

Meaning, your system is increasing the value that it's delivering to your customers and there's nowhere else that they can go that they can get that same thing because your system is the one that's specialized and doing really, really well with their data and their use case.

At Ada, for the learning phase, we have something called the reasoning log. The reasoning log allows us to track and collect every single piece of data that flows through our AI system. We're talking any user settings, any guidance or policies, any retrieved documents, the prompts, the generations by the LLM, any actions that were executed, and all of the results and inputs and outputs there. And critically,

What we do is we correlate those with the metrics, both our high level automated resolution and our more detailed ones of accuracy, safety, and relevancy. And what that allows us to do is look at different things and say, hey, this piece of guidance, whenever we add it, it actually increases our automated resolution rate. And so that's great. Let's continue to use it or let's add it in more places. On the other hand, this document that we retrieve, something must be kind of wrong with it because whenever we retrieve it, our accuracy goes down.

And so in this way, we're able to analyze that data and see where we can improve. The next step is automating this process. For that, I'd really like to touch on specifically a technique that's very powerful, which is fine-tuning, which you may have heard of for other machine learning applications.

What we can do is take a look at our reasoning log, and we can extract the failed cases. Failed cases meaning the conversations and any kind of inputs and outputs where we did not automatically resolve the conversation, or it was not accurate, or it was not safe, or it was not relevant. And what we can do is relabel them. We can say, hey, for this input, you know, it got the wrong output, so let me set what the right output should have been.

We're going to feed that to our LLM, and we're going to fine tune it so that it learns from those mistakes. Once it's learned from those mistakes, we'll reintegrate it into our system and deploy it.

We'll recollect the data through our reasoning log. Again, that's first party customer data that's flowing through our system. And again, collect the failed cases, fine tune our LLM, integrate and deploy. What we have now is an automated system where we're constantly collecting that first party customer data and we're constantly increasing the value that we're delivering to our customers through this automated flywheel and loop.

So to recap, if you take away anything from this talk, just remember this slide with these three points. Set and measure AI's impact so that you know where you're going and how you're getting there and you can quantify it. Secondly, align AI with humans so that you have those behavioral aspects being distilled into your system. And lastly, improve AI continuously. With that first party data, you can continuously improve your system.

And through that, you're not only improving on those metrics that you set, you're also increasing the value that you've delivered to your customers, and you're deepening your moat and strengthening your defensible AI system. Thank you.