Building Your Own AI-powered Devtools

Introduction

Hey, everybody. My name's Scott.

The Problem with Mundane Tasks

I'm the CEO and co-founder of Sublayer, where we're building an AI agent framework that makes it possible to quickly build AI-powered automations or build your own custom AI-powered dev tools for for tedious, time-consuming, or mundane, mind-numbing tasks that you might have to do. And make it easy to do that, and make it cost-effective to actually write automations for these things that were previously much too, maybe even too expensive to outsource, even.

1And so some of the benefits of writing these automations and writing your own dev tools can help you focus on actual innovation rather than doing mind-numbing, repetitive tasks. And building them quickly, it enables rapid experimentation.

The Evolution of Automation

Previously, a lot of these things, I guess, for very long, except for the last three years, you know, a lot of these things that we're now able to do were previously like, you know, you needed six developers to do. It took a couple of years, maybe millions of dollars.

These things, a lot of these things are like a 10 cent API call away now.

Cost-Effective Automation

And, you know, a lot of us, even me, that I've been working with AI for the last year and a half, have blinders on for some of these things that, you know, we can actually write software for and it's cost effective to write software for. And, you know, some of the conversation that's been going around with AI writing software for us is that the costs are falling.

And that opens up a whole new type of software that, you know, A lot of the time, we have to make sure that the software lives a very long time to actually get our return on investment because it's so expensive.

But if the cost falls enough, we might only need to use it once. And it changes a lot of the considerations when building software that you don't necessarily need to even worry about long-term maintenance.

That I write it once. It takes me a week to write it. I run it. It fixes my problem in a week. And then I can throw it away. And I don't have to support it anymore.

Real-World Use Cases

And so some of these real-world use cases that I'm talking about here are things like large-scale code transformations. Maybe you're doing a mass refactoring or a modernization going from one Java version to another like Amazon talked about or... You're updating deprecated testing patterns, moving from one testing library to another, and you have thousands or tens of thousands of tests, and nobody wants to go through doing the mundane, like re-changing the testing library and the testing format.

Or maybe you have customer onboardings that your sales and solutions engineers know the steps that it takes to onboard a new customer, but each customer is different enough that you haven't been able to write a solution that solves it for everybody. These AI automations, once you can build them very quickly, it becomes cost effective to actually do this. And especially for something like customer integrations, your customers are happy. They're using your product quicker. You're realizing revenue quicker.

And then I think there have been a lot of demos out there for data analysis and reporting. You, as developers, have the ability write these visualization helpers and these log analysis tools without putting a lot of work into it. And when you're done with them, throw them away.

Code Transformations and Customer Onboardings

I am going to demo a little bit of our AI agent framework that how many of you out there have either Googled or asked ChatGPT to generate a command line command for you? I know I have a lot.

While it gets it right, you have to open a browser, get out of your flow, copy and paste it back into what you were doing. But with our framework, one of the use cases that I found really useful is actually building a command line tool to just generate it right in my terminal.

Data Analysis and Reporting

So I'm going to walk through how to do that really quickly. So our framework has a new project creator.

Demo of the AI Agent Framework

So we'll do sublayer new CLI helper. You can choose from different templates. We'll choose the CLI.

You can select from any models, any model providers, or any of the models that they have. It creates the project for you. It sets up a Git repository and then tells you the next step, which is change the directory.

And then you can actually run the command, which we've set up an example command for you that you can use to check and make sure that it actually works. So we'll do bin CLI helper example. And it's a command that generates a story based on command line arguments.

So we'll do a story about AI agent frameworks. And GPT-40 should hopefully, if the connection works, come up with a story about AI agent frameworks becoming the backbone of every industry.

Sorry.

Creating a CLI Helper

And so another thing about this framework and how we've enabled it or how we've made it possible to quickly build AI-powered dev tools is that every component is designed to be generated by AI. And so to actually make the CLI helper work, we're going to use the layer. we're going to use our component generators to generate a generator. 1And what generators are in our framework are it takes information from one or many sources and then sends it to an LLM and gives you structured data back in whatever format you ask for.

And so what we're going to do is create a generator, give you a warning that we're actually going to use AI to generate this. And then the description for the generator is it will take a description of a task and generate a command line command to accomplish that task. Select the models again. And it's going to generate a generator, which we'll look at here.

And so what this has done is created a subclass for the sublayer generators. It's generated a prompt for us to accomplish the task. And then also given us at the top here what we want to get back. We want to get the command line command back.

And so let's copy this to generate command. We will give it a description of take a task description and generate a command line command. Usually, Copilot jumps in here, but I guess I must be on a slow connection. This is command line generator, and it takes a task description.

And then back here. we have the new command. And so this takes a task description, generates a command line command. So if we run it, what if we want to take a folder of JPEGs and create a GIF?

Uses convert, delay 20, loop 0, folder of JPEGs. What if we want to use FFmpeg? And it's got a whole bunch of different flags here.

And so I've actually built this in the past and have found it very useful. I've used it for even things like formatting header images for blog posts or social media. But for something like FFmpeg, there's a lot of these flags.

FFmpeg is going to do something bad to my system, but something you generate might. So if we wanted to maybe add a description, what we could do is create a new generator. And that takes a description of a task and generates a command line command to accomplish. that task and a detailed description of what that command will do.

Select the models again. Generate a new generator. And it created a new command that's actually going to give us a different structure. Rather than just giving us a single string back, it's going to give us the command and the description.

And we can go and wire that up and everything. This is a very simplistic example, and something else that we've put together is...

Advanced Use Cases

to actually talk about what Naveen was talking about. I come from a group of weirdos that practice test-driven development. So there are dozens of us out there that like doing integration tests.

But one of the things that I've put together, a much more complicated thing than that CLI generator that I've put together, we have another component called an agent. And how you define an agent in our framework is by defining what triggers the agent. how the agent checks its status, what its goal is, what it's trying to do, and then how it steps toward that goal.

Test-Driven Development Agent

And once you've defined those things, you can basically kind of let it loose and to do what it needs to do. And so for this TDD agent, we've set it up so that it gets triggered anytime an implementation file changes and a test file changes. The way it checks its status is by running the tests and examining the output.

The goal is if the tests are passing. And the way that it steps toward that goal is by taking the implementation file contents, the test file contents, and the test output, sending it to an LLM, generates a new implementation, and then saves that file, runs the test again, and keeps going.

And so I've got an example here where we can testing agent pair on pile.rb and pilespec.rb. The agent's going to run. The tests pass, so that doesn't do anything.

We start modifying this. Describe pile do. And the tests are going to run.

It's going to fail because pile isn't defined. And then it creates the implementation to get those tests to pass. If we take it a step further and do describe add, it adds the item to the contents.

Pile.new, pile.addApple. And then, come on, copilot. To include Apple.

And let's see. And so it sent the failing test, the test output, and generated the new implementation to get these tests to pass. And you can take this further, describe, remove.

It removes the item from the contents. And pi equals 1. And we can actually initialize it with something.

So let's say apple and banana. We'll do pile.removeBanana. And then expect.

It's not going to pass. There we go. And so it did it faster than I could actually finish writing this test.

But so what happened there is that I added this new remove test, or a new test for this remove function, and actually used a different initializer. It created an initializer that made both of those tests pass. It added the new remove method and

made all those things pass. But this, since it runs on either side, it's actually self-healing. So if I go ahead and delete something here, it will run the test again.

And it actually made it pass in a different way. But so hopefully I

Conclusion

Hopefully, I guess I've made it give a couple examples on either end of the spectrum, something very simple that you can do, and then something much further where you're integrating into system calls and your file system. But what we've found is that this is useful for everything on either end and everything in between for automating those tedious and annoying parts of the development process.

Finished reading?