Hi everybody, my name is Alex. I am an AI whisperer and I do some consulting stuff too. I've done a lot of work using online platforms.
That's been pretty good to me. So I've been fortunate enough to meet a lot of different clients, work with a lot of different use cases.
is I started focusing on AI right when it was turning of age as it as it were about six or seven years ago and in the past two or three years I've been focusing a lot on LLMs.
I've recently started live streaming you know my coding adventures.
I'm gonna just dive a little bit into what I use so I use Cloud Code. Cloud Code and this I'm not the only person I've recently heard at least two other developers that have had this experience.
They've tried a variety of different AI coding agents like Copilot or what have you and then they discovered Cloud Code and it was their life as an engineer before Cloud Code and then after because overnight you stop writing code.
It's a very interesting phenomenon. It doesn't mean you stop producing code but you're not not writing it yourself.
And so, one of the reasons I love Claude Code is Claude Opus, okay? Because Claude Opus is the LLM. And I know there's a lot of debate about that.
And obviously, you know, every LLM, especially when it's controlled by a big company, sometimes the performance goes down, you know, for no reason. You don't know why.
So, for two weeks Claude Opus is not smart anymore and then it comes back and you're like oh I'm back you know I have it again for me Claude Opus has been consistently the best model that I've used for coding and for other things as well you know maintains understands details of your conversation
over long periods of time I'm not an anthropic employee or a fanboy I know there are limitations I've been angry at Claude Opus many times as early as yesterday, but overall I felt that for me and for what I do, it's the best performing model.
And then I wanted to mention this Read AI tool, which I've been using for a while. I would imagine by now there are better ones, but I found this is the thing that joins all my meetings.
meetings, and it's actually really neat because it records the meeting, of course, and extracts the transcript. It extracts highlights, it extracts action items, which is funny because sometimes you
have a meeting where you just do updates, and then you've got action items, but there weren't any action items in the meeting. That's okay.
What's funny is I have some teams that I don't speak in English to, and so then the transcript though is translated, which is pretty funny. It's accurate, you know, but it automatically translates from other languages.
Okay, so I wanted to very briefly mention, I'm going to do a practical thing. Yeah, go ahead.
Is there, like, legal disclosures that you have to do when you use those tools, especially in California, like a third -party listening recorder or whatever liabilities?
Oh, I mean, everybody knows, because you can see them joining, you know what I mean? Like, it joins the meeting so they can see it. And many times, like in many meetings, people have their own thing that joins.
But sometimes I'll have a client that'll be like, what's that? And I'll say, oh, that's my note taker. And then
they'll say, I don't want to do that. And then I'll just kick it out, you know.
Good question, though.
Okay, now I want to talk a little bit about AI, because that's what we're talking about today. So I have strong feelings about AI.
I don't want to get into a whole thing here, you know, unlimited time or whatever. But I think this is the biggest thing.
Well, I mean, there are some
other big things here, but hallucinations, okay? This is big. And the reason this is big is that you won't know when it happens.
That's why this is big.
And one thing that really frustrates me, the number one thing that frustrates me with all LLMs, including Opus, is the fact that it will present a guess as if it's confident that that's the solution.
And what I discovered and tried to diagnose this is that it itself doesn't know that it's uncertain and it sounds kind of weird
you know i actually uh wrote about this uh it turns out that people are like that too you know when they when they don't know something but they really want to know it and that they produce their response and then you ask them why why are you so sure about that they don't they don't really know why uh you know why they behave this way because they learn from
us. I'm not saying they learn from us in a kind of cognitive sense, but they learn from the words that we produce, so they produce the same patterns of behavior.
Privacy, okay, obviously, you know, don't put your API keys and give them to cloud code. That's probably not, I mean, it's gonna read them anyway, but you know, I try to be careful.
And then the other thing is this over -reliance. This is really, this is a tricky one because I'm noticing myself lately, you know, I'm I'm using AI for coding, and I can see how I could very easily slip into bad habits.
You know, I'd not check the code anymore. You know, just say, ah, it's fine. Call it up, just wrote it. It's going to be good.
So I want to dive now into a task. I want to, I have a task. Funny. funny.
I have a project called Tasker, which is a task management system that I built with Claude Opus, and it looks a little bit like this.
Can you see this? Can everybody see it? No?
Okay, let me zoom in. This is a bit better? Okay, cool. You bet.
So I found this out. I was using Claude online to manage my tasks, and I found out that this is very, I don't know why but for some reason it really motivated me to keep track of my
tasks better so I would be like hey add this task and then it started to give them these IDs I'm talking about my experience with Claude now and and I found out that these are very useful because then I could say delete T 0 1
instead of how you say delete the task to add priority scoring blah blah you know but then I realized you know like yeah Claude is good but it's not really suited for that.
So like when I go between days, I have to say, okay, now create a task list for today, import all the tasks I didn't do yesterday, all that stuff. So I thought, you know, I can make my own and I can make something like this.
It turns out this is actually pretty complicated because you've got a lot of like things like recurrent tasks, right? Like if you want to do a daily task, how do you deal with that?
So let's say I do add a daily task to go for a run. Okay. Okay.
So now it's going to think that this is Claude. I'm using Claude in the background. This is my task, right? You see it says D and it's marked as daily. And if I go tomorrow, here it shows that it's going to happen tomorrow again, right?
But it's marked, like you see how it's like a little bit gray. And that's the idea is like, this is planned. It hasn't happened yet. It hasn't created the task in the database yet.
But what's cool is that I made it work with a bunch of different things like i could say you know add a hacker dojo uh to the what thursday is it today is it it's the second thursday to the second thursday of every month at 6 p .m uh and now it's gonna make it i mean i hope you know it's ai so you know be be gentle with with with me.
So it created this task, but it started today. So we're not going to actually see it today because we're already past 6 p .m. But let's see if it shows up here. This should be here. Look at that. Okay, it doesn't show up on the other days, but it shows up here, right?
If I go here, I just got the daily. And here this is planned. And if I hover over here, it'll say Thursday repeats monthly second Thursday.
Now I built all this with Claude, Claude code. I could have sat down and written all the code myself, no doubt. It would have taken me a way longer time than doing this, especially because I'm not a huge fan of front -end development. It's not I'm hugely respected, it's just not what the first thing that I go to.
How much how much time do we have? I'm going to assume that we've got about how much time? 10 minutes. 10 minutes. Okay, great. So, I don't know if we'll get to go through the whole thing,
but what I want to talk about is a workflow, okay? So, this is something that I've gotten into recently with Cloud Code.
I should probably say I'm not the kind of early adopter of new Cloud Code features like, oh, now you use MCPs and skills and sub -agents and whatever. I generally don't go into that because I like to have a lot of control over what it does and what I found is the best way to interact with that is to go step by step.
I ask, I'm very specific about what I ask and then it produces that and then I check it and then I go to the next thing. But there is one feature that I really liked and that's creating plans.
So the way that I develop features now is I think of the feature and then I go into planning mode and in planning mode you talk about what the feature is like and it stays in planning mode until you've answered all its questions and it has answered all your questions and then after you've planned
into detail to your satisfaction you can say okay except go or you can say you you know, no, I'm not done, you know, or you can say, let's go step by step. So anyway, I finished this right before the presentation.
The idea here is to add automatic priority scoring to these tasks.
So you see these, I don't know if you noticed this, but see, it says I've added a daily task to go for a run, blah, blah, blah. I've estimated 30 30 minutes for this activity.
That's right. You know what I'm talking about. So this is open source and Nick is also one of the contributors.
We added a feature just last week to estimate the duration of a task. So that's what that is.
Now, what we're going to do is we're going to add a feature that estimates the priority of a task that you can still change these yourself, the duration and the priority.
But the idea is to make the eye smarter and think about these things when you add the task so that's what that's what the plan is for we're gonna have five priority levels we're gonna see how quickly Claude can implement that okay go and while it's doing that I don't know if
you have any questions we can we can get a head start on that yes sir so yeah Yeah, that's a good question.
So initially, Cloud Code was an extension for VS Code, but it only worked in Linux or Mac. But they recently added a version, which is like this web view,
and this also works in Windows, just this web view, not the terminal. Is it what?
Web box, the part of the Cloud Code that you never told me about. So that's a web, that's basically a small web -based. Yeah, it's a web -based view, yeah, yeah.
That'd be the web box thing. right
you can't you can't change this is cloud code
it's not it's not copilot so you can't
this is yeah this is the actual cloud code
thing yeah yeah you can
change the model though right
touching something does that answer your
question okay cool yes sir
do you have to are you like on the super duper
expensive 200 a month plan or are you finding like
the mid tier plans are sufficient for your needs
that's a That's a fantastic question. I am not on the most expensive plan right now, but I also
have found that for Opus, Pro is probably not enough, especially if you're using it all day. So, it is probably about $100 a month.
Now, that's expensive, no doubt, but for what it helps me produce, for the time it saves me, that works for me. Yeah, anyway.
Yes? Yes? I have a comment on this question and then a question for you.
Pro works really well if you drop down to Sonic as much as you can and switch back to Opus. But it's still kind of... So that's the short version. Question for you.
On Opus, how do you deal with hallucinations? What's your experience as you've gone through different models, and what's your approach right now?
So one of the things that I like about cloud models in general, and I would say Opus in particular, is that if you point out that it says something wrong, and you want to diagnose what happened, it's a bit sycophantic, but it's not to an extreme. Like, you can actually talk through, hey, why did you suggest this?
And I find that it engages with you. chat GPT by contracts there's no way you can't get it to admit that it came up with it which it makes it unusable for me right so yeah so generally speaking I
try oh that's that's actually a question so I have here this is my quad MD file I probably need to zoom in a bit but there's a lot of stuff one of the main things here is to never ever used to do right I'm sort of warming up to that that?
Yeah. And the reason I do that is that when it goes into a to -do mode, what I find is that it starts focusing on checking off the items in the list, which means that it's not paying attention to each item.
The only way I'll do a to -do list is if I check after each step. And even then, I find that it just came up with a plan and just going through, bam, bam, bam. Okay.
So that doesn't work. Planning is different. Planning I found is different.
If you plan all the details in advance with this plan mode and by the way the plan mode is down here you just click on this that's plan mode and a plan mode it doesn't write things it just it just plans and then once you're done with it it executes it yeah so the other thing here is i ask it this is my first instruction
is to default to uncertainty i'm trying to i don't know if it works but it's the idea is to bypass bypass this manufactured confidence that it defaults to. And I try to emphasize, if you don't have source materials, mark it as not verified.
And that actually works. Asking it to mark things that are not verified has a good hit rate. After a while, you start noticing, hey, I think this is
the right solution, but I haven't verified it. So I found that to be pretty useful. useful. Yeah.
So I hope that answers your question. Yep. Okay. I think it's done.
It's time to restart. I don't know if this is going to work. I have no idea.
I've planned this here. You've seen it generate. Yes.
We're watching it work on a prompt you gave it to add a prioritization feature. Yeah. How much content did the prompt have? Did you tell it how to implement?
Did you tell it an algorithm or did you give it a lot of freedom? How robust is it? Is it just a look -up
or is it going to be able to prioritize unknown tasks, that type of thing? How general or specific was your prompt?
Yeah, so I started with this plan. I made this plan in preparation for today's presentation. So the plan was ready when we started, but this is the plan.
So these are the priority priority levels, and it had already planned which files to modify. That's where I feel it's different from a to -do list, right?
This is a plan. It's not, you know, like throwing a list of things to do. This is like, I've already thought through what I need to update. So that's what it's doing.
So these are the files, right? This is the front end, back end, front end, everything that it needs to touch, and then the order in which it's going to implement them.
So you can see that one of the the features is the prompt, it changed the prompt, which means that here, if I look at the changes, it has this in the system prompt now.
And this is saying assign a priority level, and if the user specifies the priority, use that value, otherwise estimate. It's not perfect.
It's just, you know, a starting point.
So the way you use this, you just select plan mode and you prompt it just like you would If it was an agent mode, you say create an app, create a feature, create a prioritization task, and then it automatically goes into planning mode and says, is this OK? Do you want more? Can I continue?
Yeah, that's essentially it.
So once it's once it thinks it's done with the plan, you'll get like a pop up there that says, you know, accept these. You could click accept and they just start implementing or you can say accept and but check at every stage or whatever. Or you can say, no, I'm not done. I want to keep planning and you can go into more detail.
If Cloud Code has its own concerns, yeah, that's a very good question. So this has to do, this goes back to the CloudMD file. Incidentally, people use this CloudMD file
for the project itself. I actually use it for general directions for how to interact. act.
And so I have here, I think, I have the state your assumptions here. Well, I guess what I was trying to say is that you can put a directive here for it to challenge you when necessary, you know, because it will otherwise will hesitate to challenge you.
But there's another thing that
I do when I'm working, like, let's say I won't go into plan mode, right? And I just want to make a change and I'll say okay this priority mode doesn't work very well I want to change blah blah blah at the end I will say in the prompt I will add let me know
what questions you have and it seems so small but it's super effective because what it does is it hot it makes it highlight what assumptions it's making sometimes it even asks questions that I haven't thought about yeah
asking we can implement it two ways what implement you will choose you can before you you need to answer it with typing right now you have a like buttons to press yeah it's the same here and i understand yeah sometimes it will just ask questions like if you go into plan mode usually it will ask me some questions uh as well so yeah i'll get those popped up but even outside of plan
plan mode, I usually emphasize when, you know, usually when it's a change I want to make, and I notice, oh, this is pretty complex, maybe I haven't thought of everything, I tell it, let me know what questions you have, and then it will surface whatever details it has.
Sometimes it says I have no questions, and it just works.
Okay, so let me wrap up. I just want to do real quick, I just want to add a task and to see if it works the way we thought it was supposed to be color -coded, but either way okay so add a task to network at hacker dojo and let's see
what it says 30 minutes ah and so this is the new priority feature you can see these ones don't have priority but this one here does this looks like medium medium priority, I would guess. So that was its estimate.
Okay, thank you so much. Really appreciate it.