Building with AI agents and Converstaional AI

Introduction

Hi everyone, my name is Umair and I'll start with a bit about myself.

I'm an aerospace engineering student going into my final year.

I just finished my placement at Boeing and this past year I've been really involving myself in software engineering and AI especially.

So as Joe said, last week I attended a hackathon, 11 labs was there, and I just came up to Joe and be like, oh, I like your idea.

The idea was ground news, but

for more like countering YouTube politics.

It's a bit of a weird thing, but so the name Polititude came about.

Political YouTube, you know.

The Groundbreaking AI Agent Project

How the Project Came About

So yeah, in my talk, I'm going to be talking about AI agents and especially conversational AI, specifically with Eleven Labs.

So as you can see here,

1Eleven Labs has this great agent tool base where you can configure your agent in whatever way you want.

So what Joe did was write a system prompt, so essentially giving a base for the AI agent to work with.

So he said, okay, you're a debating coach for Gen Z activists.

Your job is to provide them with an overview of common talking points from both our political spectrum.

But yeah, and at the time there was Iran airstrikes going on and there was lots, tons and tons of political YouTube videos going around, all sorts of information, good and bad.

Project Description

And yeah, to facilitate the YouTube part, I made a little webhook YouTube thing where

Gemini will take a YouTube video, scrape it down, pull up the key points, and this webhook will provide that 11 Labs agent with the right context for that YouTube video.

So, yeah.

Yeah, it's a fairly simple project.

You can see this project structure.

It's not complex at all.

And it was built within a couple of hours using obviously cursor.

Obviously, there's nothing happens without any hiccups, but you just got to work through those errors and then you'll get yourself in a working agent.

So hopefully, this works first time, might not, but

Yeah.

Demonstration of the AI Agent

Hello there, Gen Z activist.

I'm your debating coach.

What topic is on your mind today, or what can I help you with in the realm of political discourse?

Hi.

I've just watched a YouTube video, and I want you to analyze it about its key points and maybe give some counterpoints based on your knowledge base.

That sounds like a great exercise.

Yeah, so don't want to disturb it too much, but obviously it's going to analyze the video, get its key talking points, and then we can have a discussion about that video with an AI agent that has a knowledge base.

All right, I've analyzed the YouTube video.

Here are the key talking points from the video.

The president's decisive military action destroyed the Iranian nuclear program without significant American casualties or prolonged conflict.

The ceasefire between Israel and Iran was achieved through a combination of bold action and diplomatic negotiation, preventing further escalation.

The administration's foreign policy prioritizes decisive action and a swift exit from conflicts, a departure from previous approaches.

The president's forceful

Okay, could you please give me some counterpoints and maybe quote someone in your knowledge base?

Hello.

You there?

Anyway, it looks like it doesn't want to respond, but you get the idea.

Development is not easy.

It gets stuck up sometimes.

I think it's just mainly because it's just a local deployment, and hopefully once I deploy this, the server infrastructure is a lot tighter, so we'll hopefully get a conversation working, but I promise it worked before.

That's the project structure.

Any questions?

Leveraging 11 Labs' Tools

Let's run through what you get from 11 Labs.

What they provided you, how you leverage, and what else people can do because you have a conversational AI that allows you to text-to-speech, speech-to-text.

and what else you leverage and can we walk through?

Yeah, so obviously this is the dashboard that Eleven Labs provides and you can choose your language, you can change the voice, you have the system prompt which is what Joe wrote up to give the context of what the conversation is about.

You can change the LLM, Gemini 2.5 Flash is a good lightweight model.

Change the temperature which makes it more deterministic or creative depending on the context.

And this is an important thing that I briefly went over.

It's the knowledge base.

So here for our demo showcase, we wanted to have a conversation about the Iranian airstrikes.

And I provided different reports from different views.

And it uses RAG to maximize its knowledge base.

I made a custom tool.

to analyze the video.

So that's what it provides and anyone can easily set this up within minutes and speak to it.

Video Analysis

How did you get the video?

Did you get the transcription or captions?

Yeah, so that was like a custom thing where they got the transcription from a YouTube link, gets the transcription, gets the key points, key points is delivered to the agent, and then you can have a conversation.

So what you are saying is I can have a three hours long video just for my one of the class or something I would like to learn.

I can just get it imported and just have a conversation through that video?

So it's like I think similar?

Yeah, so obviously depending on like the longer the video is, the less stable it will be.

But yeah, the concept works the same where both the agent and you or maybe just the agent will have a context of the video.

get the key points and even provide counter viewpoints based on knowledge base.

This can scale.

You can have all sorts of different views.

It's just based on what's in the knowledge base.

Open Source and Costs

Is it open source?

Can you use it?

Yeah, I can.

It's on my 11 apps.

No, the Discord.

Your project, I guess.

Oh, my project.

Yeah, it's on GitHub.

I will... Yeah.

Yeah, one sec.

Hmm.

Yeah, it's private now.

I can make it public and I can give out the link to anyone.

How much do they charge for the agent?

Because I use 11 labs, but just for TTS.

Yeah, so... It's a new feature, right?

I mean, I didn't know you have an agent now.

Yeah, yeah.

No, so obviously, I've heard 11 labs is expensive, but the hackathon, they gave us free credits, so... Oh, okay, okay.

Yeah, so... Cool, cool.

Interacting with the AI Agent

What was your assessment, having had this conversation with the agent, compared to, say, just typing it in or any other kind of interaction?

So obviously, when we were discussing specifically Iran with it, it was able to quote specific people within its knowledge base, like for example,

I had a video from LegalEagle which had a good counter perspective and it was able to quote a specific person, give various viewpoints and it was actually pretty good compared to like obviously typing out is less engaging than speaking to someone.

And as I improve this project, I hope to make it obviously, A, a lot more responsive.

So obviously, B, expand its knowledge base.

Maybe add a data pipeline where all sorts of, maybe you can just click on topic and then its knowledge base will be based off a topic, not specific instance such as the airstrikes.

But yeah.

you could interact with, you could say, that's enough, stop, or whatever.

Would it do the same to you?

No, it's just going to keep talking.

It's just going to keep talking until, obviously, it finishes its thought sentence.

The reason I'm asking is that I guess it depends on what you're expecting it to do for you in terms of a coach.

Customizing the Agent's Behavior

Are you expecting it to point out, say, logical problems or assumptions that you've made or to help you construct an argument?

That's what I'm asking, if it would interrupt.

So if you were saying something, would it say, can I stop you there and just point out the blah, blah, blah, blah, blah?

Yeah, you can program the behavior in about customizing the agent, but right now we invented it as like ground news type idea, where it provides you both sides of the ILZ type thing.

So you can get the YouTube video and essentially just figure out, okay, it's up to you to come to a conclusion of, okay, is this what I believe or not?

But yeah.

That might be the last question that I need to crack on to.

Finished reading?