All right, so my name is Olly Christensen.
I am, yes, it's my birthday,
and I am presenting why is an AI talking to me.
So a bit about me is I'm currently developing
voice AI agents with my co -founder,
and I want to start to talk about
how voice AI has developed
and how you can actually start taking advantage of this
and implementing it into your everyday workflows.
So we all know who this is.
That's Siri.
So Siri, Alexa, and Gemini, they're all what we call kind of reactive agents.
So you say something, they react, and that's it.
But now we're starting to develop more autonomous and more agentic agents.
1And with that comes three crucial elements within that.
So we've got context, latency, and emotion.
So these three rules is how we determine what is a good voice AI agent.
So we'll start with context.
So what is the business context?
Do we have a voice AI agent
that is as a customer service agent or maybe an outbound sales agent?
We also need to think about
what is the contextual awareness of the agents.
Can it remember my name?
Can it remember maybe
the complaint I had last week?
Then we need to start understanding the importance of latency.
latency.
So for a human conversation, we have around 300 milliseconds.
So that is me saying
something and then you responding back.
So for a good AI project, AI voice AI project,
we need to aim for 300.
So a demo will sit around about 800, a good project will hit
500, and a really good project will hit 300.
And then we have emotion.
So what emotion
does the agent have?
Does it raise its tone of voice at the end of sentences
when there's a question?
But we also need to understand, can it detect emotion?
So
if I'm screaming down the phone and a voice AI agent annoyed about my customer
service, can it detect that emotion and can it look to mitigate my stress, reduce
it and get rid of it?
So what I'm going to do now is run through a practical
demonstration on how you can build your very first voice AI agent.
I've kept it
it as simple as possible.
So there is no coding.
It is just prompting and one API key.
So the platform I will be using is Retail AI.
So Retail AI is a lovely platform.
It allows you to basically just prompt out
what the agent is going to do, and then
start to implement some of the rules we've talked about.
So in this scenario, I've developed a voice AI agent
called Jordan, and Jordan's going to manage my calendar
because I'm a busy man and I can never manage my calendar.
And Jordan is going to help me do that all.
So we can see here we have it prompted out.
Does everyone kind of understand how prompting works?
Yes, a few nods.
Okay, so we've put in markdowns what I want Jordan to do.
So we've got the role of Jordan, the toolkit, the preferences I have,
so time zones, the guidelines to stop it hallucinating,
and the operational workflows.
So if we remember those three crucial rules,
we've got the context now,
we can start to look at the latency here.
So you can see it's running around about 820 milliseconds.
If we want to reduce that, we can then select a different LLM.
So usually the more expensive per minute,
the better latency it is.
Now I'm doing this off of a browser,
so there will be a little bit more latency,
but if we were to host it ourselves,
we could have a reduction in latency.
And finally, we have emotion.
So we come here to the agent handbook.
We can start to play around with some of the emotion
and make it sound more natural.
So we've got a personality,
so we can start to add in maybe a professional personality.
We've got our natural filler words,
so the ums and the ahs we have in natural conversation.
And we have what I mentioned before, high empathy,
so detecting when there is stress or potential anger.
So what I've done here is I've given Jordan the functions via the cal .com API key that I'm using for my scheduling platform.
I've simply just connected it there and that allows Jordan to now start using some of the tools.
So we go to speech setting, I can then start to mess around with the response of Jordan and how fast Jordan starts after I've finished.
And I can also look at the interruption sensitivity.
sensitivity.
So how fast does Jordan stop talking when I interrupt it?
So without further ado,
we will test it, see what Jordan can do, and let's go.
Hey, Jordan, how are you doing?
Hey,
Jordan, how are you?
Hello, this is Jordan.
I'm here to help with your scheduling needs.
How can I assist you today?
Wonderful, Jordan.
So in one sentence, please tell me what do you do?
I manage Ollie's calendar through cal .com to make scheduling meetings smooth and efficient
for everyone.
How can I help you today?
Okay, Jordan, I want you to book.
We'll do a meeting
with me and my friend Matt for next Tuesday.
Let's do next Tuesday.
So the 28th, please.
Got it.
Could you please tell me the preferred time for the meeting on Tuesday, the 28th?
Also, may I know your time zone or Matt's time zone?
Sure thing.
Let's do 10 to 10 .30, London time zone, please.
Thanks for that.
Just to confirm, the meeting is on Tuesday the 28th, from 10 to 10 .30 a .m.
Wait, wait, wait, Jordan, Jordan, Jordan, Jordan.
Sorry, sorry, sorry.
Jordan, can we please actually include agenda in this?
I need an agenda with Matt.
Let's do, okay, number one, we'll do catch up.
Number two, we'll talk about Mindstone.
Number three, we'll talk about pizza.
And number four, we'll then just go close out.
Can you also add in a Google Meet link as well, please?
Thanks for the details.
I'll include the agenda with those four points and add a Google Meet link.
Can you please provide your name for the booking?
Just put it under Ollie and Matt.
And for Matt's contact details, just use a placeholder email, please.
Thanks, Ollie.
Ollie.
I'll use a placeholder email for Matt.
Let me check availability for Tuesday the 28th
from 10 to 10 .30am London time now.
The slot on Tuesday April 28th from 10 to 10 .30am London time
is available.
Should I go ahead and book the meeting with Ollie and Matt including the agenda
and Google Meet link?
Perfect.
Do that Jordan for me please.
Yes.
Your meeting with Matt on April
28th at 10am London time is booked with the agenda and Google Meet link.
I ran into an issue because
because Matt needs a valid email or phone number for the booking.
Could you please provide a real email?
All right, Jordan, just use a placeholder.
We'll go Matt's at placeholder.
Thanks for that.
I'll use Matt's at placeholder .com as Matt's email and book.
The meeting now.
It seems the booking system still needs a valid email for Ollie as well.
All right, Jordan, don't worry about it.
We won't do a booking.
That's fine.
No problem at all.
If you need help with scheduling,
But we can see in a real world that may happen.
But what I've got here is an example that I had before.
So what we did is we had the placeholder email,
which for some reason John didn't like this time,
but we had the Google Meets,
and we also had the agenda as well,
along with the ability to reschedule and whatnot.
So we can see how with this being a very, very simple agent
and just prompting, we run into errors.
as.
So we need to start increasing and having it a bit more complex and increasing the technical
ability.
So, we've talked and I've showed you a little bit about kind of the use cases.
So we could have Jordan start to do more content creation.
So we could take that voice, use
text -to -speech and start narrating maybe some social media posts.
We can also do that the
The same for dubbing as well.
We could also start to add more kind of employee feels.
So use Jordan as a customer assistant
and increase maybe using inbound and outbound calling,
and also increasing it with an example of Jack and Jill.
So Jack and Jill is a startup where
you speak to Jack, the voice AI, and Jack assesses your CV
and your expertise, and will then pass you on to Jill
if Jack likes you, and Jill will then connect you with potentially recruiters or interviewers.
Then we start looking into the future, so what is the multi -modular?
This is the ears
and the eyes, so can we have an agent that sits there on my computer, can see what I'm
doing, but then can actually go, hey Ollie, maybe you've got this line of code wrong,
let me fix it for you, and I can communicate back.
Then looking more into the future, we've
you've got six, so invisible UI.
So can I say to Jordan, please create me a voice AI agent
that's just like you, but change it
for the context of my sales team and make
it have a different accent?
So that's what we can start doing.
Now, we understood the context, the latency, and emotion.
So I showed you a very simplistic AI agent
with just prompting an API key.
We ran into some issues, and that happens.
But we need to start using MCP servers and actually increasing that, increasing the technical aspects, but also the complexity.
So for those that don't know, an MCP server is a module context protocol.
So imagine a human body.
We've got our brain.
We've got our hands.
So in order for us to use that, we've got a nervous system that connects the two.
So the LLM is our brain, and the tools are potentially cow .com, Salesforce, HubSpot, and so on and so forth.
So we use the MCP server to connect that from the LLM to the tools.
So in the case of Jordan, Jordan got that wrong and couldn't do it.
But if we actually connected Jordan via an MCP to potentially Salesforce, our CRM, we could actually take in the correct contact details.
else.
We could actually maybe raise a ticket as well and say, OK, well, I need to remember to do
that.
We could even connect that to Gmail, say, and send an email with documents we may need for
me and Matt.
So the last two kind of summarize it together perfectly.
So we've got opportunities
here, but it's important that we have that human connection.
And that human connection
is really, really critical.
And then we'll go and talk a little bit more about this
as the night goes on, 1but we need to automate admin, not the human connection.
So can we
add in escalation routes to ensure when mistakes happen like that, a human takes over and can
deal with it?
So what can you do?
You're probably thinking, well, how do I actually use this?
I've put in a free guide here in the first QR code so that you can start to actually
understand how to use it, implement this from simple prompting, and then take this all the
the way to actually integrating MCP servers and increasing
that technical aspect and making it more reliable.
And by adding in MCP servers, we actually
increase the context awareness.
So now Jordan can actually remember what we're doing.
She can take maybe the contact details from Salesforce.
We increase the emotional awareness,
and we reduce the latency.
So right there is my LinkedIn.
in.
I'm happy to connect with everyone and talk about what everyone's doing and how everyone's
growing in this community.
Thank you very much.
I've been Ollie Christensen.