Building complex software using Cursor and Anti-gravity

Introduction

Great.

Thanks very much for having me out again.

I've usually talked about technical topics like

reinforcement learning, philosophy of AI, stuff like that.

I've played for the last several months

playing with these coding agents to see what they're really capable of.

And so I just wanted

to share my experiences doing that today.

Background and career context

I've been in machine learning, whatever you want to

call it, for 35 years.

When I was doing my undergrad and grad at U of T, Jeffrey Hinton

was a professor there.

I'm an aerospace engineer by education, so Jeffrey Hinton would hold

seminars on neural networks and machine learning in the 80s.

So I attended all of those because

I was fascinated by the subject.

And then after I graduated U of T in aerospace, I didn't

want to build weapons of death and destruction.

So I didn't want to move to the United States

either.

And so I found this accidental career of bringing more math and science to business.

I was shocked at how the tall towers on downtown, on Bay Street, how little math and science they used back then and how little math and science they actually use today to run their businesses.

So it's been this accidental career of using technology to automate business process and improve decision making.

Why focus on coding agents now

So today I'm going to talk about my practical experience with coding agents.

I've done like a whole pile of projects and I use it for my work.

I do consulting.

consulting.

I had my own company for 20

years where we were automating

retail planning and automating

fraud detection and medical diagnosis

so like no human in the loop reinforcement

learning is what we did.

The company

was sold a couple years ago.

since then I've been doing consulting.

I've been

doing consulting, helping

companies implement AI and

building roadmaps

and doing

some fractional executive work.

I'm going to do

some description of the projects I've worked on, what you can expect from

these coding agents from my perspective, what the process I have used to use

these agents, give some examples of good context, and then just give some tips,

tricks, and then just my general conclusions overall.

Projects built with coding agents

So here's just some

examples of things I've built in the last couple months.

Email-to-ERP automation (plus voice/IVR)

So the first one was

kind of a web app that scrapes emails so this is like your Starbucks your oven

broke you send a service request email to your to your service provider and

that email has to get entered into an ERP system and the ERP system generates

a work order the work order sets the technician the technician runs out the

door and has to show up within four hours to fix the oven so that's the

cycle today there's people standing there receiving the email reading the

email typing it into an ERP system manually hitting enter making lots of

mistakes all kinds of stuff like that so I've used AI to use technology I don't

like I hate the term a I use technology to scrape the email and extract all the

information required by the ERP system and connect to the ERP system using API

eyes and enter all the data into the API into the into the system there's a human

being who has a chance to review the scraping to make sure that it happened

correctly and then they just hit submit and the data gets put into the ERP

system and the cycle time is reduced from like you know ten minutes down to

like one minute by using the technology to help automate and and that's one

thing I did I build a voice interface on that as well so if somebody from

Starbucks phoned phoned for a service request to have an intelligent IVR talks

back and forth to get the same information that you would put in the

email and then it populates the ERP system all that done was done in like

Django Python my sequel open source tools and I tell people I could have

done this 30 years ago like the technologies existed forever this is not

new I find very funny that the work I'm selling now is stuff that I sold 30

years ago now there's just hype around people need to do it like first thing

you need to do is organize your data and automate processes and all of this stuff is stuff we sold

when I ran IBM's data warehousing and machine learning practice 30 years ago so I find it kind

of interesting full circle I feel like my 30 year old ago self today with all the work that I'm

A harder experiment: CFD grid generation

doing I tried something more complex doing a CFD grid generation so that's computational fluid

Blue Dynamics.

I tried to take my master's thesis and see if I could build part of that

with AI, doing it in Python, super math -oriented type code.

I'll show you examples.

Metadata-driven ETL framework

One of

the things we did at Daisy, we automated data warehouse management.

So I tried to build

a metadata -driven ETL framework.

So all the ETL required to build a data warehouse where

all the code is stored in the database and parameterized, and then it's assembled on

the fly as as you receive data that way you don't have to manage if you're managing 20 data

warehouses you don't have to have 20 copies of the code base and you don't have to go in and

manually edit code files so that was something built in python shell scripting mysql hive

Infrastructure as code (Azure + Terraform + Docker)

then i tried doing infrastructure as code so using azure using terraform if you're familiar with that

to build infrastructure automatically so all of the above applications i built are ephemeral

So I tried to build those using Terraform with Azure CLI scripts with a separate VM to submit jobs from the ETL metadata framework using Docker to containerize this so I can install it anywhere and did all that with Terraform and Cursor.

So those are some examples.

Using agents for RFP responses

One other interesting thing I did was I got lazy doing an RFP response and I wanted to see if these coding agents could respond to the RFP for me.

So I submitted the RFP, gave it a whole pile of examples, and generated some text that was interesting.

So I'll share some of my learnings executing these kind of projects, which I did over the last couple months.

What to expect from coding agents

They shine on well-documented, common problems

So the first thing you need to know what to expect is if there's a lot of information available on the Internet

and lots of documentation, like on StackFlow and vendor websites and all of that stuff,

then the agents will probably do pretty well.

So if it's really vanilla stuff, like I think one wheelhouse is if you have a web app that talks to a database through a REST API, that seems to be like a total no -brainer for these AI agents.

Probably because there's 800 trillion websites on the internet, everybody talking on StackFlow.

I had this error, what should I do?

I had that error, what should I do?

And all these answers and all of that stuff.

So when there's a ton of documentation available, they tend to do pretty well, right?

Right.

They struggle as the domain gets more specialized

The more specialized the subject matter is, the more difficult time the agents will have.

Right.

So in those cases, those cases, you just need to use the agents as kind of maybe code support where you do most of the coding and most of the thinking.

In CFD, there's some online material, but not enough because that was a horrible failure.

That I'll show you what the AI code achieved.

If you can’t build it yourself, don’t rely on the agent

achieved and one final note is like if you can't write the code yourself it's never going to work

like you have to know that I could do this myself but I'm too lazy I'm going to get this agentic

code thing to do it a little bit faster than me because if you don't know how to do it you don't

know how to adequately prompt it's all about prompting in context and so if you don't give

it good prompts and good context you will get garbage out it's like the classic garbage in

garbage out, nothing new, same old shit that we've been talking about forever, garbage in,

garbage out.

So if you can't code it yourself, good luck.

Unless you're doing something super

duper simple like build me a recipe or make me a two -page website, something like that, right?

If you're doing anything significant, you need to architecturally understand exactly what you're

doing.

Common failure modes: losing context and “fixating”

And you can expect the agents to get stuck.

They like totally fail.

I think they're like

like senile people, that's, you know, my parents went through that dementia and senility, and I

find that working with agents is a lot like that.

They're lucid at moments, and then they lose

context totally.

You're not having a conversation with them.

These LLMs think one token at a time.

That's their thought space.

One token forward.

That's it.

They lose track of time.

They fixate

on something you said 10 prompts ago.

Like, it's crazy.

You really have to know what you're doing

and pay attention but having said all that they can be useful for certain tasks so the process

A practical workflow that works

Context is king: write a detailed spec and provide artifacts

that i use is the context is king right excellent context the most most is most important you have

to write a super detailed spec so i'll show you i wrote like a 25 page spec before i give it to the

coding agent and i use like gemini or chat gpt to help me write the spec but i edit it make sure

sure it's exactly what I want.

Give it tons of previous examples.

So I give it, here's the

last five examples I did.

Here's the spec.

Here's the PowerPoint deck I showed the client.

Here's

the slides the client sent me.

Here's all my thoughts.

Here's the best practices presentation

I did at last month's Mindstone.

That's all related to the topic.

So giving all of that

context, giving screenshots for the UI, all of that is super important.

Vibe-code a V1, then debug toward something usable

And then you can vibe

code the first version if you like vibe coding.

So my son makes fun of me.

He calls me the vibe

coder.

He works for Google.

So you vibe code the V1.

That's when you have a blank slate.

You have

no code.

You give it a good spec.

You go write me the code and then you spend time debugging it.

you work in that vibe mode.

You debug it to get whatever the AI decided to build because you give

it a big spec.

It's not going to go build at all.

It'll build something close to what you wanted or

part of what you wanted it decides what it feels like doing and then whatever it built edit work

with it to get that working like test it do your UAT and get whatever it built working to some

degree and then you know fix it one bug at a time at that point once you've got a code base

you got to narrowly work on one thing at a time have really narrow conversations

Have the agent document everything, then restart with that context

and then once you've got that first v1 working the what I always ask the thing to do is write

write a bunch of code, you know, documents to document the code base.

So, you know, draw code

flow diagrams, draw architecture diagrams, draw data models, document every single line of code,

and then read those documents so that you understand exactly what the AI agent has written,

like read through the documents.

I'll show you examples of the documents.

And then after that

point, you put it up in GitHub or wherever you source control it, then you work out one narrow

feature at a time and the documents are great context now that you've had it

describe what it built that's the context you can give it and you need to

you know reset your chat and then once you get there then you know you have to

Refactor passes: remove bloat and unnecessary complexity

remove code bloat like the code this stuff writes is really bloated and fat

and it finds these little narrow edge cases and builds all this diagnostic

stuff that none of which you need so you need to do a remove code bloat every

once in a while pass through the code.

So that's the process I typically use.

What “good context” looks like (real examples)

RFP responses: include the RFP plus your prior examples

some examples of context I guess that RFP I talked about I gave it the RFP

request converted to PDF.

I gave it examples of previous RFPs I responded to.

I gave it a best practices deck and I said right I didn't use Gemini or I just

used actually cursor to do that and and it did you know did pretty

good.

With the good context it saved me time.

What would have taken me eight

hours.

I got done in like two hours because I gave it all in really good context.

Infrastructure: start with an architecture diagram

Infrastructure's

code that I gave it an architecture diagram of the infrastructure I wanted to build.

So I want

you to build these VMs, this HD insights server.

I want these private nets, like this private

network, these virtual networks, these endpoints, these firewall rules, these NSGs.

So again, you

need to know what you're asking for because if you don't know anything I just said, you'll never be

able to build a Terraform infrastructures code project.

ETL frameworks: supply principles, patterns, and detailed decks

The metadata driven framework, this is where I gave it a 25 -page deck with my data warehouse

principles and best practices.

For CFD, I gave it the desired output of what the shape of the grid should be.

And then detailed tech specs, you can see in that there's like 25 pages, a markdown

file that I created that's about 1 ,000 lines long.

So that's the spec that I gave before it started coding.

UI/UX: provide screenshots and field mapping details

This was an anti -gravity

UI UX context I gave it a screenshot of a login page.

That's what I want my login page to look like

So I said, please make that login page

That's what I wanted my screen to look like that the human being will review with all those fields are what it's

Grabbing out of the email.

It's showing the email on the right pane and

And then, you know, I said, go build that and map all these fields in the email to these fields in the document and a bunch of gory details.

So that's the UI UX context, and it's able to build a style sheet and do that and mimic it.

If you don't give it that, you know, who knows what it'll build, right?

CFD: even with target outputs, complexity can break the agent

The CFD code, my target, that was the grid.

So when you're doing computational fluids, you need to create a grid where at every single intersection of the lines,

it calculates all the fluid properties so it's a u -bend my thesis was to calculate the flow around

a u -bend from the picture on the right there you can see the flow separates and you get a little

vortex downstream of the bend so that's where you want lots of grid points near that vortex and you

want lots of grid points near the walls not so much in the middle not so much in the straight

part so I you know I built that for my thesis and I said that's what I want it to look like

I want 81 by 201 grid points and I want you to focus

the points at the walls and where the you know, we were that little vortex and separation bubble is so

That was a painful exercise.

That was eight hours of my life.

I'll never get back

So I started with it build me a simple thing build me a nice straight duct and around point 20

That's where the separation happens.

So cluster the points near there and cluster them near the wall.

So I was able to do that

build me a straight duct.

I was really happy with that.

Then I said, okay, now we're going to build

a radius.

Now the outer wall is going to be longer than the inner wall.

So I said, take that grid,

the exact same grid, but lay it so that the top wall is now longer.

It's the length of the radius.

Okay.

You see my thinking here.

Now I'm going to say, bend it like you're bending it around a bar.

And then it lost the plot on that moment.

That's the best I could get cursor to do.

And I spent

Spent eight hours, gory detailed descriptions, giving it examples, showing it its mistake,

and it kept on making the same mistakes over and over and over and over again.

No matter how hard I tried, I changed LLMs.

I tried every single LLM that Cursor had, and it just couldn't do this task.

And so these things are not intelligent by any stretch of the imagination.

imagination it's a sausage grinder that you need to know what you're doing and you need to know how

to prompt it and you need to know that there's a good base of information out there to be able to

to do this otherwise that's the shit you'll get but a lot of it is invisible because you can't

tell if you're looking at a code file the code file could be that shit but it just looks like

code and you don't know any better right so you need to be super careful with uh using this in

Tips, tactics, and team practices

in situations that's why you need to understand what you're doing so tips you know you can buy

Ask for exhaustive documentation (and actually read it)

code that first first code is a blank slate you can debug that first version then ask the agents

to document when i ask the agents to document i go i want you to document every single file you

created tell me exactly what the purpose of that file is and then line by line on every single line

Tell me what that line of code is.

What is the function call?

What does it do?

What is that variable name?

Why did you create that variable?

Why did you choose that data type?

I said document it like I'm a moron and I know

Absolutely nothing right and I get it to literally if I have 10 ,000 lines of code

I'll have a

40 ,000 line document right and I say draw me code flow diagrams draw me how one function calls another out one file

calls another how the user interacts through the code show me that draw me

the data model an entity relationship data model and then I actually read all

that stuff you know you got to read it all in detail so that you know the code

base because you can't just blindly go through this right and then at that

point I say write me a requirements document so I gave it one but I said now

write me a requirements document that's consistent with what you built at the

the same level of detail that I gave you and write a technical spec of what you've implemented

at the same level of detail that you've created.

And it's great at documenting because it's got a

perfect example that I can mirror.

It's just mirroring the code you created.

And this document

is awesome.

So now what's that document?

It's the context for your next ask.

So now you reset your

chat, delete it.

It's never seen anything.

You start all over again.

You pick a new LLM and you

you go, read my context, my 50 ,000 line document, my tech spec, my requirements spec, my flow

diagrams, all of that stuff.

And that's a great start to do all of your chats.

So this is a sample

network diagram for my Terraform infrastructures code.

Amazing that it could draw this.

I just

said, draw me a picture of my infrastructure, and it barfed that out.

It's a little bit of

editing on the colors but pretty close that was great draw me a database er diagram drew me a nice

er diagram with relationships and many to one things and all the things i know so that was

pretty good saved me time to do that and uh again great context um draw me a code flow example so

you know there's the flow through the different modules and the code and did you click the button

or not click the button, I drew like 20 of those, so you know it's really good at

documenting code, way better than me, I'm like a lazy coder, all my employees used

to laugh at me because I would name variables like shit and crap and

then never clean those out and then sometimes it ended up in clients

client code, like one time I was doing something for Bank of Montreal, we're

building trade areas for branches and then if we couldn't find the branch name

I just put like, but F nowhere, and that showed up on one of the labels.

And the client called me and said, these are awesome, Gary, but where's this one branch?

I'm going, whoops, you shouldn't do that.

The AI agents don't do that, by the way, so that's maybe one improvement to my coding standards.

So read all your documents, right?

Make sure you understand them, right?

right?

Upload your project to GitHub at this point or whatever source controller you use.

And then

Micro-develop: one small feature at a time

from that point forward, you micro -develop.

Micro -develop one little feature at a time to keep

you want to, as your code base grows, you have to really narrow the focus to keep it on one tiny

little thing at a time, right?

And, you know, do a branch, you know, branch your code so that you

can do a pull request later.

Again, provide, get the agents to read all the documents, get it to

to focus on certain parts of the code that you know you're adding the features to.

Again, provide

a super detailed description, as detailed as you can.

And then now ask the agent for a plan.

Before

they do anything, ask it for a plan.

Give me a detailed plan.

Cursor has ask mode, so it'll give

you a plan, show you the code it intends on implementing, because you don't want it to do,

because when you ask an LLM to do something, it won't just do what you want it to do.

It'll try

try to improve everything, even though you didn't want it.

It's like, the rest of my code is fine.

Don't frickin' touch that.

You know, like, just do what I ask.

They can't do that.

And I ask it.

I go, why do you change everything when I only asked you to do this?

Well, it's my job.

I try to improve everything on every pass through the code.

Okay, so this is why you can't just let it go free.

You need to ask for the plan, ask what it's doing,

and then if you want to let it vibe code at that point,

you know, buyer beware.

If not, you can implement the code yourself and check the plan that it's

valid, right?

1Use plans, branches, pull requests, and code reviews

So then do the pull request before you merge into the main branch, especially if

you have a team, you know, you need to do pull requests and review the code.

Because if you have

a team of developers, it gets scary because everybody prompts with a different level of

quality.

You get all kinds of garbage your code could turn into.

So doing the pull requests and

and code reviews before you merge it back

into your main branch is super critical.

And having meetings with all the developers

so you can learn from each other

to consistently prompt the same way,

to consistently write with the same coding standards.

You can give coding standards as a context

for the agents as well,

but I think doing code reviews

and these pull request reviews are critical

to get your whole development team on the same page, right?

And after every micro feature,

ask the agent to update all that documentation so that the documentation

is constantly consistent with with the code code you've done right and here's a

sample plan I asked it to write me a giant plan this was like a 30 page

document when I designed a data warehouse layer so it implemented it

gave me a big gigantic plan on step one writes beautiful nice prettily form out

out of documents, so better than I would write myself, so it's useful for that.

Okay, just time check?

Sure, almost done.

Keep sessions short, reset often, and switch models when stuck

Keep the coding session short, so reset the chat

frequently, right?

Remember, every time you ask for something, the agents could do whatever the

heck they want, so reset that chat frequently.

If the agent gets stuck on a bug, which it frequently

does, it gets like that CFD example I showed, you could not get out of that loop, change LLMs if you

you can't figure it out yourself or figure it out yourself because even though you should be able to

it can't it gets it gets insane and then you'll get it in this infinite loop and when you have

coding teams way more way more complicated and you have to be super methodical sounds like software

Conclusions

development practices basically is what you need so conclusions good software development practices

no shortcut for that you need to be as methodical maybe more methodical than you were before

if you can't code it yourself forget it go do something else coding agents are like that senile

dementia page right they're sometimes lucid they sometimes remember they latch on to a crazy

concept that you said 10 chats ago they have no sense of time they do whatever the heck they want

even even though you didn't ask right and it's only good when there's a large corpus of examples

examples to build from, and very new and innovative stuff, forget it.

Where the time savings come from (and where they disappear)

If you're an experienced programmer, you know how to create context and the agents, and

you give it, you know, direct it hard and push it hard, you can get stuff out of it.

I mean, I save maybe 50 % time.

If you're writing some bigger projects and it gets in a loop, it actually takes more

time than doing it yourself when you're doing complex stuff, and there's a grey area.

It's hard to know when you're transitioning from you're saving yourself time to you're taking more time

There's a big gray area in the middle there and but I like it.

Treat agents like a (fallible) dev team for the boring parts

It's like my dev team

I used to have a team of 20 developers.

I was a chief scientist of my company

I did all the inventing I invented all the patents and did all the hard stuff and I like doing the hard stuff and I'm

Lazy doing the boring stuff.

So give the boring stuff to the coding agents makes me happy feels like my dev team of old

I would explain something, they would mess it up, not because they were bad, but because I didn't explain it adequately, which is exactly what happens with the AI.

So I find it's just like a faster life cycle with my dev team, and it still takes a dozen iterations to get it done.

And it all comes down to how well you communicate and ask for the stuff.

So hopefully you found that helpful.

That's my experience with these agents.

I'm going to continue to use them.

I find that it satisfies my need to be lazy and not write boring code

and lets me work on the hard stuff.

And I can multitask and do like two or three or ten projects at the same time.

So happy to answer questions.

Closing and Q&A

I'm going to hang around until 8 o 'clock, so if you want to chat,

I'll hang out in the hallway for a little bit.

But I'll stop here.

Thank you.

Any questions?

Anybody?