What I Learned Building With LLMs

Introduction

Hello, my name is Otavio. I'm from Brazil, but I'm in Lisbon for eight years now.

Thanks, Michael. Thanks, Stefano, Mindstone, for having me here. It's a pleasure.

I've been here, as Michael said, one year ago, where you are. And I had the pleasure to work with Michael. And I'm here today to talk a little bit about my experience building applications with AI, with LLMs. systems.

Just for curiosity, how many of you here are technical? Okay.

How many of you have tried lovable or bolt .new? Okay. Excellent.

Background: Building Apps Faster with LLMs

And I've been working on the past eight years here basically as a software engineer with a full -time job, but when ChatGPT came to to life, 2022, I realized that I could create applications so fast, even before Lovable.

We could get the code in ChatGPT and put it in the repo or in VS Code. And then it turns into a pleasure for me to create apps on the side.

Whenever I had a spare time at home, or even to get hands dirty, as we used to say with technology that sometimes we don't have the chance to work in our work, in our daily basis. I was doing that at night at home.

Side Projects and Early Prototypes

So I've created a lot of them on the past. So here it's one like you have like Headspace for people. It's a React Native app that you have for Android and iPhone.

You have here like a Sonar Cube that can analyze the code that is being generated.

From Problem to Product: Pivoting to an AI Initiative Scoping Tool

So whenever I had an idea, I said, oh, that's a problem. So, the conventional process that generally leads us to create a product.

Starting as a WhatsApp Memory Assistant

But especially this one took my attention that it started as an AI assistant with a connection to WhatsApp to stand as a memory extension for us to help us to kind of store everything that we needed to remember, connect it to integrations like Google Calendar, Notion and Todoist.

But later on I pivoted after an experience of implementing AI in a company from scratch I understood that this is a big problem and therefore I decided to pivot and go for this solution

Pivot: Estimating ROI and Complexity for AI Initiatives

Which is basically a system that you can use a chat Or you can use a form and you can put a lot of ideas that you have in there or AI Initiatives that you are willing to implement in a company that could be implemented by let's say a head of AI or a product manager manager, and from that, let's say, scope, it will return to you the ROI of this initiative, the complexities, and so on.

Research and Kickoff with AI App Builders

On the first part, the starting point, here the research part, I think it's very important before you jump into the lovable for the ones I think everybody knows, but it looks pretty much like this.

Defining Requirements Clearly Before Generating Code

You have an input field Where you can create something from scratch? I want to create a dating app or I want to create a forecast app wherever you need. It's going to create for you then The most part of the application let's say the front end

Of course, there is a free tire here that will block you until it builds the whole thing 1But before you get in here I understand that it's super important for you to know exactly exactly what you need because you might have an application that needs Let's say stripe for charge your clients or you want to create it in a certain way That it's the the detachable or reusable for another component So the better and the more the most information that you provide here on this first part It's super important for you to have this start point

And, of course, I'm not going to get much into the detail of the product market fit thing, but, of course, we assume that you know exactly what you want to sell. So that's the initial part that I generally do, like make a research using AI and eventually cross information between ChatGPT and Cloud or other LLMs to try to understand what is the product market fit, what is the target, what is the competitors, and what differentiators I would like to have.

Tooling Landscape: Lovable, Bolt.new, and Base44

The kickoff, as I said, you have lovable, you have also bold .new, you have this other very cool solution, which is base44.

That's a very cool use case, by the way. The guy built that for six months alone and sold it for $80 million.

Documentation as Code: Giving Agents the Context They Need

dollars but that's paid I don't like pretty much this one I prefer to do it in lovable export the zip the file and then get to get it to github and jump into vs code yeah documentation as a code that's something I previously to

this job that I am now I worked in the pharmaceutical company called Roche in Switzerland most of you must known and I had to do something there that I didn't

Architecture Decision Records (ADRs)

I have to do in my past, which is ADR, which stands for Architecture Decision Records. For every decision that you make in a software, you have to document that in an MD file inside the code.

And it pretty much looks like this. You have here instructions. For instance, I have an architecture .md. Let me make it bigger.

So I have here a lot of, of course, this is marked down just for the formatting. but you have a lot of details on about the architecture but I also have informations here like back -end or environment even the exits here I'm explaining the basically in an MD file how I would like to sell this company in

the years some years from now and why this is important because the agent when you're coding here with a or vibe coding with this agent here it will demand from you or at least it's better for it to have the more context that that is possible regarding the code and regarding the business and that's why we wanted the the documentation to be it here

Business and Agent Instructions Inside the Repository

so regarding documentation the most important ones that i've discovered is business -wise instead of having a notion as i had here before i am putting everything here in So in the instructions I have all this that I've already showed you. So I have for monetization, I have for stripe setup, I have testing, user deletion, whatever it's needed, it's here.

The other one, which is, let's say, the main one, it's the pilot, the co -pilot instructions. Let me close that. Here it's going to say how your agent here should behave. So it's saying, for instance, all the architecture decision records, it should be saved on that folder. The format should be with a time stamp like that, and so on.

Here, it's the ADR that I mentioned with you. So I here have a lot of decision records that was made in the past.

So let's say, what's up, auto enable. Whenever I made this commit and this push and deploy this, I have this record here, and it's super easy

important as I said because the agent will know why did you took that decision in the past and it's going to eventually avoid hallucinations, avoid duplication the code and so on.

Keeping Architecture Diagrams in Sync with the Code (PUML)

And another thing which is super cool it's the this open -source thing which is called a PUML which is a file that can run I run the script and it can generate the architecture for me.

So I no longer have to go manually in a Scully draw or draw a whole thing and redesign my architecture. Or every time that I make a change, I have to go back to the architecture and make a change.

That doesn't make any sense. So I can just create a document that the agent knows that it should reflect the code.

And whenever I have a change or I want to make a presentation, I can just present that, which is going to be updated regarding the code as it is today.

I'm not monitoring the time. Anyone? Yeah, seven minutes. Okay.

Some time for questions and answers at the end. Okay.

Experimentation: Model Routing and Agent Workflows

And then let's jump into the second slide here, experimentation. Model routing, different model abilities.

Different Models, Different Strengths

I understood that each model knows how to do it, certain things better than others, because they were trained for certain purposes different than others.

And it's super interesting, and that's probably, that is a cheat sheet for that.

But sometimes I think it's more experimentations and feeling, because that's changing a lot and very fast as well.

In my case, I had the experience to have problems on solved here more complex with Gemini 3, for instance. But with Opus from Cloud 4 .5, it solved other things that Gemini didn't do.

Working with Agents in VS Code (Copilot) and Parallel Tasks

So eventually, and here in my case, I'm using VS Code with these subscriptions for Copilot because I can use this window here, as you can see. This window here is where the agent lives, where I can access a history.

I can open a different. So here is all the history that I have. I can run multiple tasks at the same time, so solve different problems or implement different features at the same time with different agents.

And here you can choose the agent or the model that you want to use. So here are the free ones. And here are the paid ones, which consumes more tokens. And of course, I just used to solve more complex questions. Discover and understand.

Using Documentation to Explain and Validate Past Decisions

The other day I was, and that's a problem actually, I was presenting this system for a person the other day, and I showed him this and I said, I have a Postgres database here, where I'm storing the memories, the users, but I have another Postgres, sorry, another relational database here, which resides my knowledge graph and so on and the guys asked, why are you using two relational database? And I said, I don't know. And I really didn't know.

Because that, in the past, the system is all in Next .js, the back end and the front end. But on the past, the back end and the memory system was developed in Python. And I was consuming this microservice via a hunking face. And that's when he created that. that.

But later on, I asked these questions for the agent here. So I came, because when I finished the call, I came here and I said, why are you using two relational databases? Can we merge them into one? And the agent answered me that, which makes a lot of sense. Let me make it bigger. Yes, technically, you could possibly unify everything in Postgres since it's completely relational databases, but we separate it because it's optimized, so it has a reason. And more, it knows why we did it that way because it has the architecture decision record, the ADR here, which is the Knowledge Graph SQLite implementation. So here, I was super happy actually to know that this all documentation thing that I'm doing, it's working so far.

Other Approaches: Cursor and Coding Agents via CLI

let me go back here to the presentation I the new approach so the tools there are cursor AI you can use cloud code here inside on the CLI so if I go here on the terminal here if I type Gemini it's installed in my machine that's another way to code with an agent today probably most of you already know but just saying for the ones who never saw that I can just ask anything that I want want here, it's going to read all my code and my project. I can do the same thing for a cloud code, if I open cloud, and so on.

A New Development Speed—and What Stays the Same

And that new approach, or speed, what I mean with that is that today, the way we make applications and code is completely different.

From Line-by-Line Coding to Prompt-Driven Execution

Before, let's say I want to implement something in the project, and I have a task where I have a document where it says where the feature is, what is the acceptance criteria, and so on. But I would have to sit down to understand the logic exactly and do line by line.

Today, you can simply perform some prompt here. And you can go for a coffee. You can come back and check everything that was done. OK, that was good or no.

No, you have to make it better. So the speed and the way that we're doing things completely change.

And that's important for us to understand, because eventually you don't need to be in front on the VS Code.

You can, at the same time that this agent is performing a task, you can eventually revising a code from another agent that is doing another thing.

So I think it's super important for us to test this. And, of course, figure out what is the best. There is no silver bullet, I think.

I've adapted myself to use VS Code. There are people out there who prefer Cursor AI with Cloud and so on. I think it's a matter of choice.

Still Foundational: SOLID/DRY and Token-Aware Code Structure

Solid and dry, that's very traditional or foundational things from architecture and software engineering. and that's still here, and that's the basics.

Some things changed a little bit for Solid, but it's where you should still have things like, you shouldn't have files with a thousand lines of code.

You should have files between 50 and 200 lines of code because of the token's consumptions. So there are a few things that are still the same.

Efficiency and Security: Prefer CLIs for Integrations

1And last but not least, efficiency and security, CLI. The best, the most part that you can do here on the CLI, so Stripe, I haven't implemented anything manually in Stripe, I just generate the API key, come here and say, hey, I want to connect to the webhook in Stripe, create a webhook and so on.

Of course, some providers or some services some libraries don't have, like Clerk, for instance, for authentication, they don't have a CLI, you have to do everything manually, but the most part of the applications you can do it via CLI and the agent.

Autonomy vs. Control: Safety Lessons When Letting Agents Act

That was a bad thing that happened once with me. I was doing something here, giving a lot of autonomy to the agent here, saying, hey, you can do it whatever you want.

No, but I need your AMP. And I said, no, no, you can go there. You can take whatever you need in AMP and so on.

And then suddenly, the guy pushed, I think, my AMP to remote. And I said, no, no, you shouldn't. What are you doing? Like, undo that, and please remove it from remote as well.

And I was like, taking coffee, take care of the kids, and blah, no, no, don't do that. And when I came back, the guy simply deleted the whole thing in remote. So I basically lost my project.

I had to come here in Verso and basically copy code by code.

So it's super challenging, like this middle term of autonomy versus being more, let's say, conservative.

LLM as Judge: Cross-Checking One Model with Another

LLM is the judge. I think I'm on time already. Yeah. Yeah, LLM is the judge.

Last thing, it's like something super cool, which is an AI judging another AI.

You can do it manually. So for instance, I can come here and say, say, hey, I want to build something out, please give me, let's say, an alternative for that. And I have an answer.

And I can go with that answer and ask Claude, hey, what do you think about that? It's an AI judging another AI.

And of course, as you can imagine, you can do that inside the code, behind the scenes.

Conclusion and Q&A

And that's it. I have here a QR code with with my profile in LinkedIn for the ones who want to connect.

And we will open for five minutes of questions. Thank you, Alex.