Building Agents with Confidence: Safely Deploying Agents at Scale with Jai Mansukhani

Introduction

Hi, my name is Jai. This is me.

That's Jeff Hinton eating a sandwich. That is amusing to you.

But essentially, I am a 23-year-old student. I just graduated from U of T at a computer science at University of Toronto.

This is my second startup, actually, my first startup. I chose to leave earlier this year to actually work on this because I discovered this problem through my old company.

I was the ex-president of UFDAI. I just graduated the Next36 cohort in Toronto and was a Toronto chapter lead for Foundry Institute for six months.

Engaging the Audience

Okay, so I want to start off by kind of asking a question, irrespective. Like, first of all, who is technical, who's non-technical?

If you're technical, raise your hand. Okay, that's a lot of you.

And if you're non-technical, like you are very on the brink of like, okay. Okay, I understand you. This presentation will also cater for you to don't worry at all.

Challenges with Building AI Agents

Okay, so for those who know what an AI agent is, obviously Robert gave a very good explanation of what it was. Why do you think building agents is hard? Just shout out an answer. Like, why do you think it's hard to build a robust agent? Anyone, you can just shout it out.

It's hard to get all the different scenarios. correct okay yeah anyone else okay the data shout out answers it's fine memory okay okay correct okay okay what else anyone else Okay, performance, memory, data, that's what I'm hearing right now.

So essentially, this is what we think, right? So you have a really big learning curve for products they're using, existing frameworks, so there's LandGraph, QAI, et cetera. You have to go through the documentation, you have to learn.

It's really pissing off. There's a minor mistake that could happen. And there's limited control. You don't really have control of all the variables.

That leads to poor choices from bad data. Those compounding errors lead to lower accuracy. And eventually, we see this snowball effect, which kind of makes robust AI agents impossible, which sucks, honestly.

Real-World Examples and Solutions

So I want to go back to what this is. This is a real Google search from Google AI.

So Gemini, that's when they added the Google search. Food names that end with um. Here are a few fruit names. Apple, banana, strawberry, tomato, coconut. Interesting, right?

So that is actually a hallucination. And that's what our company started doing initially. So that's the first wedge of what we started doing.

We just helped businesses which are building these LLMs and agents kind of detect where their models are hallucinating. That was kind of the standout factor. And

Essentially, this is how we see the frameworks, right? Shout out to the guy who put this tweet out.

Blankchain AI must die. Sure. And essentially, what we're doing is there's complex workflows.

So like how Robert explained a very simple demo of Chatforce, this is something essentially what it could look like in the background. It's really complex. There's a lot of frameworks, and you have to build and understand a lot of things.

This kind of gets worse as you build complex agents, it gets really expensive. So the cost per query is like way higher than just using like an open API. And at the end of the day, it leads to costly legal consequences. So if you look at Air Canada's like screw up, which happened with their chatbot that they were using, which essentially could be an agent, it gave a customer wrong information and they were sued for it.

So yeah, it's not good to have inefficient frameworks.

Agent Frameworks and Future Directions

Essentially, I want to kind of start off by touching upon O1. Who's used O1 right now? Raise your hand up. ChatGPT O1. Okay, who hasn't and who doesn't know what O1 is? You can raise your hand. It's okay.

But apart from that, we see agents as two extremes. So level zero is a self-built agent. So essentially what you would do is user sends a query to the AI, it selects a function from a certain library, returns the function, generates an answer, and gives it back to the user. So it's a very simple overview of what an agent is.

But in the future, This is what we want to strive for. So anticipatory agents, prediction agents, where you can generate a function out of thin air and apply to any use case that you have.

Sorry, this is a blurry picture. It just came from Twitter. So I apologize for that.

But essentially picking a random function, checking if it exists, generating a function with a function tool, and then predicting the next step. And that's pretty cool, especially for us.

So what we're doing is essentially building better infrastructure for agents. So people or companies that are using agents, they're building agents on Landtree and Landgraph, Chatforce, for example.

For us, our main focus is to kind of test and stress the environment where they're making decisions. So what is the best and most reliable decision that they can take to choose a tool or make a step?

And one thing is that we separate data that's grounded versus ungrounded. grounded data for example is something that i would have in my chain data so for example i know through a wikipedia search that the capital of um france is paris for example that's a grounded truth but if i have a query that's us and it's something that i've never heard of in my life that's an ungrounded query so i don't know what that is and we actually can differentiate both of them and then build a better kind of structure for a user to test their agents on

And this is how we do it. Essentially, it's a small Python package, which is pip install OpenSASAMI SDK.

We extract the valuable information. Shout out strawberry having two R's.

And the accuracy rate is zero for that prompt. We also give the user a reasoning and a domain-specific web search on citation.

So for example, if you're a legal company, you're a health care company, we'll focus on those domains that actually make a difference towards you. And we find the most optimal agent configuration.

Demo and Events

Now to the fun part. So this is a demo of something that we've been building as of late. Hopefully it works. If it doesn't, I'm sorry. I will blame my engineers. No, I'm joking.

We are also a person that people that also like to host monthly events. We're hosting an event on October 22nd. We'd love to have you guys attend as well.

Who has heard of ADA before? Like, okay, just one person, surprisingly.

So ADA is like one of the largest like conversational agent companies in Canada. And we're just hosting like a small get together where you can actually meet the CEO and CTO of ADA and like learn how to build agents more reliably with them. So that's super interesting.

Conclusion

And yeah, that's it. Thank you guys so much.

Finished reading?