Advanced Coding with AI

Introduction

I work at Orbital Health. It's a company that provides a platform for value -based care. It's a new way of doing health insurance that focuses on patient

outcomes.

Why AI-assisted coding matters in a data-heavy healthcare context

So the kind of stuff that I, the kind of challenges that I hit are dealing with large data and extracting the relevant context to present to the

user i don't want to spend too much time talking about me i have transformed my life as a developer since i discovered cloud code now i know everybody's got different experiences with a variety of tools and so on but for me cloud code was the thing that really took me from just using

tab complete to no longer writing code i was working at a another startup before and the owner Connor once sent me a direct message on Slack that was linked to an interview of Andre Carpati, who some of you may have heard of, where he was saying that he hasn't written code since December of last year.

And I replied to my boss, I haven't written code since June. I don't know if he was aware of that, but I think he took it very well because he encouraged me to to uh kind of share my knowledge with the rest of the team and uh yeah so that that was part of what what i did there is just kind of encourage everybody to um to use ai in some form or another

now some of you may be using cursor or copilot but in general the idea with coding with ai is that you describe what you want to ai and that it writes the code right and And there are a lot of opinions about what model is best and so on.

Choosing models and tools (Cloud Code, Cursor, and Opus)

So I was, a couple of days ago, I was at a conference in San Francisco. It's Andrew Ng's conference, deeplearning .ai.

So I walked past two people that were talking, and I caught this part of their conversation. It was just the end of a question, which was, I wonder if I should use Opus for that. And I was very compelled to stop just for the purpose of saying, no matter what the question is, the answer is use Opus.

And then, you know, we got into a discussion about budgets and all that. But that being said, I think Opus, Cloud Opus is your go -to when you're coding with AI. And I've been using it for a while now, and it's done wonders for my productivity.

activity so um cloud code with cloud opus or cursor with cloud opus um there's also gpt55 which does okay i didn't like 5 .4 but again you know opinions are varied i also have questions about opus 4 .7 i like 4 .6 better but we're not going to stop them from releasing new versions So I think I have to just accept that.

What today’s talk covers: Cloud Code skills

So I want to talk today about cloud skills.

Raise your hands if you're familiar in general with cloud code and writing code with AI agents. Okay, cool, great. How many of you are familiar with skills at all? Okay, great, perfect.

This is going to be a very easy presentation, so no problem. problem.

CLAUDE.md basics: global vs project configuration

Now, the interesting thing about Cloud Code, and everything I say here applies to Cursor as well, probably to Copilot, although I'm not that familiar with that tool. But

you have this CloudMD file, and the one thing that I recently realized is that you can have this file in multiple locations.

So you can have, for example, you can put your communication communication style in CloudMD under your user folder, right? So it's user, your username slash dot cloud. And then you can put the CloudMD in there.

And that describes how you want to cloud to communicate. And if you put it there in your user folder, then it will apply to all your projects.

And then in your projects, you don't have to worry about communication style or global preferences you just worry about what's in that project right okay now the the next step

from here is to use well there are lots of other steps sorry like it's not necessarily the next

Skills: reusable prompts you can invoke on demand

step but one step you can take from there is to add a custom skill so what skills are is sort of like a prompt that you don't have to type well once it's like you took that prompt and you put it in a skill file and now you can activate it whenever you want and i want to show you this

skill i just wrote this skill because i wanted to demonstrate how easy it is is to make a skill but i haven't tested it so it may be a disaster uh thanks for your support so

Quick look at the setup in VS Code

yeah please please i can't see well what tool is that it looks like some kind of an id Yeah, this is VSCode. It's just VSCode. Yeah, yeah.

And I guess I made it a bit larger. Hopefully, it's a bit easier to read.

So, on this right hand, this is my cloud code, of course, and on the right hand side, I have the skill. I'm not

The TDD skill: writing failing tests first

gonna go through all of it, but the general idea for this is, how many of you, like, raise your hand if you're familiar with test -driven development, TDD. Okay, Okay, great.

So, the idea with test -driven development is that you write tests first and they fail. And then you implement the features that would fix those tests. And the way to do it this way is that it really constrains what you can code and it makes it start at the effect that you want to create.

Most of the times, tests are something that you do after the fact. But with test driven development, you have to write the test first that encode all the behavior you want to have from your components. It's kind of fallen out of style recently, and I'm trying to bring it back, especially because when you use cloud code, you can easily make it write all your skills for you or your tests for you.

So, I wrote this test, this TDD skill, and the idea here is that we're going to write the tests first, and then it's going to stop. Now, I actually don't even know if I have, in this repo, if I have tests. So, maybe this is a good place to add some, but I want to show you what this does.

If I can, no, this is not it. it. Okay.

Demo project: a Next.js knowledge-graph app built with Cloud Code

So this is a thing that I built, which is a graph of AI related knowledge, I was doing a review of AI topics. And this is just for my own purposes.

But I built it, it's just a front end project, there's nothing more sophisticated than that.

And here you have a bunch of things that this particular node is related to and then you can click on something else and it'll take you there and if you see here I mean it's kind of hard to see but

these are like sort of animated stars and if you move your mouse around here then well I guess the stars don't necessarily move that's okay but the point is this is just a simple next .js app that I built in like one afternoon

Adding features with TDD (and coping with slow runs)

afternoon with cloud code and so let's say that I wanted to add another feature like maybe I want to make sure that that all of the nodes are connected to all their relations properly like everything that they

mentioned there has a connection on the graph and so I'm going to go here to cloud I'm going to say hey there are we sure are we sure that all the nodes are connected to all their respective relative topics and meanwhile while it's doing that let me bring I'm

gonna bring this up I'll see I don't think I have any tests so this is gonna going to be good.

Now, another thing that may prevent me from demonstrating this is the fact that the internet, for some reason, my laptop is glacial. So this may turn out to be unsuccessful. And in that case, I'll just talk theoretically about what would happen here.

The idea is that when I say I want to implement this, this skill will trigger and it will say that it's going to to write tests first and the test will fail and then it will stop and say hey is that working and or rather are you happy with the tests and then I can say yeah I'm happy with the tests go

ahead and implement it but because it's so slow I'm going to just let it take its time and instead I'm going to open it up like just encourage you guys to jump in if you have questions or comments or anything about what's going on.

So yeah, so these are the three steps here. The step one is write the test first.

What about your prompt, invoked this skill? Is there something in the work that you use that Claude has been taught to invoke this skill? You said, hey, there are all the nodes connected. Not yet, no.

What I wanted to say is to make sure that this is something that I can iterate on. And so my hope is that it's going to say, no I don't have any proof of that and then I will say TDD implement this

feature right yeah yeah but yeah it's the the challenge that I'm having right now is that it's it's kind of taking its time which I was not expecting sorry about that okay so as a side note what what's currently happening is that Claude is generating some Python scripts to check if all the nodes are correctly mapped.

But you know what, I'm going to stop. I'm going to do something else.

Use TDD to implement a wait, do I have a search box? Suppose they do. Let me see architecture.

Oh, perfect. Okay, good. Click. Okay, that's good.

This is surprisingly complete in terms of features. I was hoping to just come up with something. But, oh, I know, I know.

Use TDD to implement a feature that shows the most important topics in a list on the left side of the screen. So now that I said use TDD, we're hoping that this will trigger the skill. But if it doesn't automatically trigger it, you can always use slash TDD, which is the name of the skill.

And so what I expect to happen now is that it will say, hey, I've generated these. Or actually, it will first say I don't have any tests. Implement the tests.

Then it's going to show me what it implemented. And they should all fail because the feature hasn't been implemented yet. Yep. Yep.

Discussion: speed, cost, and day-to-day tool choices

So, yeah, this is another downside of Opus is that it's also quite slow compared to, say, Sonnet.

Haiku and when “good enough” is the right tradeoff

Does anybody ever use Haiku? That's a topic that I have been thinking about.

Yeah? What has been your experience with it?

It's cheap. It's good for simple tasks, not for coding, like for, you know, add this to my to -do list. I don't need to pay Opus for that.

Yeah. Simple, yeah. right yeah and it's pretty quick too right yeah so do you use that mostly in

the Claude app or do you ever use it in Claude code oh okay cool how many people here have installed open claw okay that's that's a good number cool all All right. Nice.

Anybody use a different variant? Some other kind of claw? Nanoclaw? Okay.

What do you use? Hermes. Hermes. Okay.

How do you like it? It's good. It's good until my expectations go too high.

Have you had any horror stories? Like something that really backfired? No, I just installed it on a really old laptop, so it only has that personal file, so you can delete everything. Oh, good.

Okay, cool. But it still, it hasn't, right? Yeah. It's very useful.

It's just sometimes I just expected it to work, like today I expected it to work like complexity on a browser where you speak to it and it just operates yeah that part but i think that's quite an expectation well i say that's something to work to work towards for sure

OpenClaw and safer enterprise alternatives (Rebel)

i don't know yannick is rebel available can folks like request access they can okay so what i this is just kind of uh off the cuff right now the mindstone team has developed a basically the way I think about it is an enterprise safe version of OpenClaw, basically, and they've created a ton of custom MCPs that you can configure how you want.

But I find it is light years better, and again, I don't work for MindStone, and I use Codex, I use Cloud Code, I use all of the systems. Rebel is head and shoulders above.

Now, I do pay significantly for it for various tasks but anything that's worth doing well I find rebel is exceptionally capable of and they they allow you to connect your you know GPT

Pro account to it so you can run it through like open claw used to allow you to with with anthropic before they kind of came to the garden hose so to speak but anyway how do folks request access if they want to try to come talk to me

after I if you if you've kind of been curious about open claw but didn't want to like let it loose on a machine or using like an old dusty machine I would highly recommend to you to check to request access from Yannick again I

don't I don't stand to gain anything from making that recommendation but it's an exceptional kind of safe open claw alternative if that kind of thing is something that you're curious about then my son team is done a really remarkable

I think the mindset team is generally living in the future like I'm so stuck and so it's really it's a really cool product to check out that's just kind of and limited release beta right now.

So that, and I would highly recommend you can check it out as well. Rebel, cool.

Do other folks have questions for Alex about how he's approaching this with his VS code? He's getting Claude to invoke a skill that he has taught it to do test -driven development.

What is that, what kinds of questions are coming to your mind about what are the applications of this kind of a functionality in your environment? Yes.

How to write and iterate on skills

When you write this test, Ruben, the spec, it looks like it is in plain English, no structured syntax, nothing like any programming language. How do you refine this and make it comprehensive, more complete? And I'm sure it is iterative. You start with something, definitely it's going to be very incomplete, and you have to make it complete.

And what is the process for doing that? Yeah, that's a good question.

So, most of the times when I write a skill, I actually, okay, I actually use Claude itself to write it. This one, I actually created on the Claude app and downloaded it and just told Claude to code to create it, put it in the right place, and just make it a skill. But the text itself was already generated.

And, of course, you can go in here. I mean the structure is essentially starts with this name the name of the skill and then the description and then it's just marked down with various sections of what it's supposed to do

this is essentially just a prompt that's it it's just a big prompt and then it tells it you know hey here are the steps that you should take and what I think is interesting about this particular skill is that here for step two it says after generating tests stop and then tell the user what

it generated so it's sort of like a multi -step skill but if i wanted to change anything i could even either go in here in this file to and just directly change it or i could tell claude hey add this extra feature to the skill um the location for skills is under dot claude skills in the in the current project folder here and it's the same for cursor it's just dot cursor

Team workflow: reviews, PRs, and test coverage

service life skills yeah please so after you do tdd and generate code are you going to do manual review or ask your co -workers to co -review your stuff before you check in if you're talking about how that would work in a team i think most of the time it just depends on what the preference is you know it's sort of like and how quickly the team is moving because nowadays i find that a lot of

teams are under pressure to go very quickly so to have an extra review step just for the tests feels like a lot.

But what I think could work is you write the test, they fail, you implement the feature, then they pass, then you make the PR, then you ask the code reviewers to specifically review the test to make sure you've got enough coverage. Okay.

But some of the stuff like new features, say if it's GUI related, it's very hard to TDD test it, right? Yeah, it can be for sure.

Yeah, here it's more about, I think the test it's going to write is whether the component is there whether it's getting populated that kind of thing yeah it won't write anything uh the test will just be unit tests they won't really be any integration

tests or anything like that okay yeah so after you generate the ai generate code for you are you going to review carefully or you just say okay i generate i trust it just checking production code

You know, that's a very good question. So the question is, after AI generates code, do you review the code or do you check it in? And that's a very loaded question.

So I like reviewing the code pretty closely, but I find that in practice, that takes up a lot of time, right? Especially because AI tends to over -engineer code. It tends to write code that's super detailed, that adds fallbacks, and so on.

So in practice, what I mostly do is I, if it's a small PR, then I can look at all the code. But if it's big, then I will usually look at it at high level. Like, are the functions that I need in there?

Do I understand what this function does? Is it working the way I want to? Are all the tests passing? And if, and then I can run some integration tests myself.

Like I can start the app, make sure all the features are there. And that's about the level to which I get. So you had a question, yeah.

Quality, compliance, and regulated environments

Yes, if you're writing for healthcare, and it doesn't seem like you're actually writing for a therapy app, but let's say mission critical application, how in practice are you then, well,

adhering to a quality management system where technically you would have to be able to document that there isn't, you've reviewed the code and it's not doing anything that it isn't supposed to.

Like if this was a critical device, it's not moving off the hard monitor. Right.

Well, so the way to ensure that is by having a very strong evaluation suite. And you have to run that every time. Like that's really the key, is you need to have,

beyond unit tests especially with an ai agent you have to have an evaluation suite that actually tests the test individual queries for a variety of use cases have you i don't again i don't know exactly what you guys do but yeah have you run into issues with your quality folks whether that meets the fba

I should probably say that in my day job I look at all the code now I answered this in kind of a general sense because I also work on a lot of variety of projects and so when I am working on my own projects I normally don't look at every single detail so I want to qualify that now our

code where I work there's there are multiple layers of quality control there's you know the developers, individual developers. We have unit tests. We have test cases that we go through to

test what we're doing. Then we have the code review. Then it goes to testing, the testing environment where people get to hammer against it.

Not just us, but actual subject matter expert users that are ensured that everything is produced in the highest of quality. And then we also have have an automated evaluation suite that gets run to ensure no regressions happen.

So that happens for any change. And like you said, I mean, in this particular case, there isn't anything, like, our major concerns are that no PHI data has surfaced, and it's by design that way. So there's not really a risk of that happening, yeah.

So there's no regulatory issue by design, yeah. Does that answer your question?

Yeah, I mean, generally it's a question, you know, FDA is, you know, really in general is trying to figure out what it's going to do about AI, and then it's going to come, this issue of how it's going to view using AI to code a product, and, you know, you and I might agree, it's not going to be any different, there's no way I'm going to code 100 % perfect, there's going to be bugs, right? when you use the same systems to test them, I'm not sure the BFDA will agree with that. And I'd be interesting if anybody else

had actually come across this with them coming back and saying, no, we don't think this measures up to human code. So I should probably also say that

most major vendors that I'm aware of have special contract provisions for when they deal with PHI, including zero data retention and specific contract terms for dealing with

you know FDA and healthcare HIPAA compliant clients yeah so they have to go through that otherwise we don't get to use their tools yeah because of the the reasons you said. Yeah.

Yeah.

When you're writing tests with KPI, how do you make sure your, how do you make sure the AI doesn't do like misty stuff?

Like for instance, you have a guardrail in the prompt that they don't touch production service.

Do you, maybe one thing, if you don't mind, pause one second.

One thing, as you all know, if you're coding agentically, you've got to make sure your agents are always working in the background.

Do you need to do anything to make sure your agent keeps working before it, because right now

the human bottleneck is your ability to answer question and is there anything that you want to say like just keep going or something I just want to make sure that you get into the demo here and then and then definitely come come back to the question I just wanted to flag that yeah absolutely I actually don't know

off the top of my head how to run these tests so I need to ask it how do I check that the tests fail okay so sorry tell me your question how do I ensure that the it doesn't hit like while I'm test that it doesn't accidentally like hit the production so yeah so right so I

Guardrails: least privilege, whitelisting, and zero trust

mean I think what we normally do is we don't have in our local environments Most of the time I don't have my local environments credential to hit production or anything beyond the test environment So that automatically like even if it tried to hit production, it just wouldn't have the access to do it

Well see when you say block access that makes me think of Blacklisting meaning like I have specific areas where I tell them not to go to but I prefer white listing which means I tell it the areas that it can go to. And the others are by default, you can't access them.

How do you tell it? Is it like through a prompt or is it like a...

Oh no, you just don't give it the information. Do you see what I'm saying?

Like you, for example, in your setup, right? You maybe go to your production databases by, you know, through your AWS portal, let's say, or Databricks or whatever you're using, right? but you locally when you're developing there's not really any reason to have your production credentials in your development environment so I just don't put those in any place where they I can find them yeah yeah that's the safest

way oh I see yeah yeah I think I think that adopting a whitelisting mentality is the best way to to go forward and just as a side note nowadays security is for everybody, I mean, the everyday security is very focused on this concept of zero trust, right?

Where you build the entire security infrastructure around the idea that you only give an agent or a user enough access for enough period of time to do what they do, right?

So it's like temporary permissions, even for users, that they have to manually activate. and so yeah basically don't trust any anything yep all right so now we're

Wrapping up: running tests and validating the TDD flow

going to try to run these tests my goal is to make sure that these are going to fail I have taken up quite a bit of time I might still doing okay or are you going to kick me off let's let's try to wrap the next five yeah sounds good by

By the way, just so you know, Alex's agent is going to keep running in the background during the next demo. Of course. And we can check in with Alex. So it's not that the agent's work will stop, but we do want to make sure to give time to others who have prepared. Absolutely. Yeah, of course.

Thanks so much for everybody's support. Okay. So, yeah, I'm just using PMPM to install the testing framework. framework, and then I probably will stop as soon as we can see that the tests are failing,

because you know what's going to happen next, right? The tests are failing, the agent is going to implement enough functionality so that they pass, and I'm going to run the testing suite and they're going to pass. While we're waiting on this, any other questions?

I still have a few minutes. Okay, cool. Sounds good.

A practical warning: don’t trust an agent’s confidence

I would say don't trust agents. I love working with AI but I don't trust it. It's not that I don't trust it because I think it has a secret agenda, it's more like I try to understand its abilities and its limitations.

And I can tell you one thing that has bit me over and over again is the fact that AI talks with such conviction that I always believe it, even when it's just very wrong. And then I find out it's wrong and I tell it, It says, oh, yeah, I wasn't looking at the latest APIs. Why don't you try this? And then I say, listen, why don't I try nothing? And you actually do the research and be sure that this is the right way to go.

Okay, well, this thing is still taking its time. So that's the general idea. But I want to just show one more thing here.

End state: agent stops after tests for human approval

What it told me here, it did stop after writing the tests and then said, test written. this is what I what I'm testing and so on and then it stopped before I implement do you want to adjust anything so I'm happy that it stopped after writing the test okay thanks everybody appreciate the time