Generative AI and the future of developer work - Rodrigo Mendoza-Smith

Introduction

A quick talk. This is essentially a talk I delivered last year at Oxford's GNI Summit and this year also at IBM Zurich.

About the Speaker

I'm Rodrigo. I'm founder and CEO of Quine. We basically use machine learning to quantify the experience of developers from the code they have written in the past and use it to match them to GitHub issues they can solve for money.

Impact of GenAI on Developer Work

I'm going to talk to you about how GenAI is affecting essentially the present of developer work and the future of developer work. Everyone would be relying on software to produce value for themselves and to actually be an actor in the market. We actually have data to prove that this is the reality that is currently happening.

So when I gave this talk at Oxford in November, the CEO of Graphcore was present. And he said this quote that he was like, we believe in the future everyone would be a developer. So everyone would be coding in one way or another.

This is how many developers are now on GitHub, and essentially how many developers are on GitHub with at least one open source contribution, right? So there's of course many developers that are on GitHub that do not have open source contributions, so the number is a lot bigger. But this is how the audience of open source contributors on GitHub is growing from 14 years ago, essentially.

Introduction to AI Developers and Benchmarks

A few weeks ago as well, there's this other company called Cognition AI. They introduced a new AI software developer, software engineer called Devin that by some metrics, essentially by the SWE bench, he was achieving 13% benchmark in this particular dataset.

So for those of you that are not familiar with SWE bench, SWE bench is a dataset. that is being used right now to measure the effectiveness of AI software developers. And it's a dataset that takes a sample of a number of GitHub issues and pull requests on GitHub and essentially challenges AI agents to create a pull request for any particular issue.

This new AI was able to solve 13% of those issues, so essentially create a pull request just from context of the repository at a particular state in time, sorry, a specific point in time, and information that was present in the GitHub issue. 1I mean, this means that, well, I mean, if we were to measure the universe of developer tasks as these data sets, AI right now has solved 13% of software development tasks, essentially, right? Which is a lot because before Debian, the sort of like maximum benchmark was 3%, right? So like it was essentially increased massively.

Now, I don't think, and this is just like a personal opinion, I don't think we will see AIs gain 200% anytime soon. I think it will continue growing, but it will essentially plateau at some point.

But what this is telling us essentially is that if we developers are essentially actors in the current market, we're drowning, right? And essentially like AI is essentially like the water level that is just like starting to like make us irrelevant at the moment, right?

Leveraging AI for Software Development

So the question here is as software developers, what can we do or how can we leverage AI to not only avoid losing value in this market, but actually become more valuable in this market. And this is what this talk is about, essentially.

Software Development as a Market

And I'm going to basically start with a very silly simplification, but I think this will illustrate the point I want to make. We can think of basically software development as a market where there is a supply and there is a demand. I'm going to represent them in standard economic terms like this.

The thing is that for software developers, and this is like a universal constant and has always been like this, the alpha for us is in the scarcity we can prove to the market. Right now, some engineers that can probably do very good AI or train NLMs are making up to $1 million at some companies like OpenAI. But this is not new, right? Like a few years ago, people that could do Solidity or Blockchain or Ethereum were making like 300k, 500k.

So it's always like in tech cycles, the value that developers can extract from the market is directly proportional to the scarcity they can prove. And only if that scarcity aligns with the needs and the demand of the current market.

Macro Forces Impacting the Supply Curve in Software Development

So to really understand how we can essentially become more scarce and how we can, I guess, like reap maximum value from like this current market cycle, we need to understand the market forces that are affecting the supply and demand cores at any given point in time, right? And right now, I'm going to focus only on the supply curve, because the supply curve is the one that we can control, right? It's the one where we actually belong to. And there are three macro forces that are making this curve very, very blurry.

The first one is education, right? And by education, I mean how developers are training to be developers.

The second one is certification. So once I have trained to be a developer, how do I prove to the market that I know coding or that I know something or that I can offer something?

And the third one, and one of the most controversial, is augmentation. With AI, essentially, if AI can do part of my job or some of my job or most of my job, how do I stay relevant when these coding tools are essentially there?

The Present and Future of Developer Work

So the present of developer work looks like a very blurry supply-side curve, right? And right now, we don't yet understand how these three forces are acting in this supply curve, and we do not understand where the equilibrium essentially is achieved.

But what we know, or what we believe at Quine, is that the future of developer work looks like a very pristine and very transparent and very shiny supply-side curve. And essentially, it's a curve that has been augmented with AI and with data. It's a curve where essentially, through AI, we can optimally allocate developers to the best tasks that are available for them in the market.

It's a core where, again, by virtue of using data, we can essentially price the skills of developers based on the previous track record they have on open source or in previous jobs. And it's, in general, a market with perfect information, whereas developers, we can just enter the market, leave the market, and monetize our skill sets Because there's like an oracle that can essentially recommend those work and can essentially direct our attention to the issues or to the jobs that essentially are most relevant for our particular skill set, right?

Key Forces Shaping the Developer Market

And yeah, so I want to essentially, I guess, zoom in into these three forces very, very quickly. Before I tell you how essentially, I mean, this will be a shameless commercial, how Kwan is helping developers essentially place themselves in this type of market, right?

So, education, right? How is education affecting the supply side of the software development market right now? Like, well, 20, 30 years ago, if you wanted to write code, you had to go to university. You needed an advanced degree. You had to be like a Stanford type of graduate, right?

Now, knowledge is abundant. Knowledge is everywhere. There's content online. There's Wikipedia. There's, of course, ChatGPT. So the challenge for developers right now is not knowledge acquisition anymore. We're all learning from the same sources. So essentially, we can all be developers from one day to another if we actually want to.

The problem that developers are facing right now is standing out. So if we are all learning from the same sources, if we're consuming the same material, if we're all doing the same Netlify app as our final project for our masters, then we don't have anything to actually prove that we are different and that we can do something that no one else can, right? So this takes us to the problem of certification.

How do we help developers certify their skill sets? And how do we help developers stand out from the crowd in this very, very crowded market? The question is, how do I prove my value?

Well, I mean, as professionals, we'll have LinkedIn. We'll have a professional profile. This is a very obvious way to prove our value. The problem here is that essentially everyone has LinkedIn. LinkedIn is not verifiable. It's kind of prone to bluffing or augmenting what you actually did, or you're sort of lying a little bit about your skill sets.

The second one, of course, are diplomas, and maybe diplomas are a bit less common than a LinkedIn profile, have a slightly higher cost for acquiring them, but still, they're not perfect, they're costly to acquire, you need to invest time and energy in going through a course, doing a degree, and at the end of the day, it reduces to a situation in which if you have money and time to pay for it, you can literally buy it, right?

What is becoming very, very common in some professions is creating a portfolio, and designers, photographers have a portfolio, and now developers as well are starting to have portfolios. So this is perhaps a better way to show how your interests, how unique you are, and what can you offer that no one else can offer.

We think that for software developers, the best way to actually show the market that you're unique and that you can do things that no one else can do is through open source contributions, right? And this is because open source contributions, especially if they're done in a repository that is actually used and has been validated by the market, they provide validation upfront. It's very, very different to arrive at your interview with a small Netflix clone that you created in your course than with a contribution to Ethereum or Bitcoin. It just says so many different things about you.

So open source contributions currently, the problem is that, well, the problem and the opportunity is that only the minority of software developers actually have contributions to projects that are relevant, right? So essentially, like, it still requires a lot of effort to find a community that you like, to essentially find an issue that is open, to understand the issue, to get onboarded to that issue, and then finally to invest the time and energy to go all the way to create a pull request and actually merge that particular issue.

1But open source contributions are a great source of information, are an amazingly rich source of data, and I'm gonna tell you why, right? I'm pretty sure, is everyone here a developer? Is anyone here that is not a developer? Okay, cool.

So, I'm gonna show you how the process of contributing to open source works very quickly, and then we'll basically be, we'll become more aware of what type of data breadcrumbs there are in this process. We all know that the world runs on computers and that these computers are built with packages, right? Like these are the building blocks of software. And these packages are actually hosted in repositories, which are mainly open source repositories. Now, these repositories are maintained and created by a network of contributors. So these are open source contributors. And the way these repositories collect work from contributors is there is this resource called GitHub Issues that they use to define an engineering ticket and broadcast to the contributor network that there is a need to perform one change in the repository. So our repos have issues that basically tell everyone, hey, we need to change this specific thing in the repository. Can someone please come help? This doesn't work.

There's a bug here, et cetera. And what the network will do is basically someone will usually just spot this issue and will offer to fix it, and they will go through what is called the PR workflow in which they just create a feature branch, they submit a pull request, they get comments from maintainers, and finally they merge the code. And at the end of the day, you just have code that is very neatly written, very well reviewed by the maintainers, and everyone is happy. So now let's rewind a little bit and let's start to become aware of all of the data that was present in this process, starting with the code.

In the code, we have essentially data of what functions, what classes, and what objects the developer that wrote this contribution used or was exposed to when writing this contribution. In the PR workflow, we have comments from maintainers, we have feedback for the person's code, and we have essentially an understanding of how important the repository to which this particular individual contributed is. In the issue metadata, we have a bunch of text that describes the engineering problem that the developer is solving. And we also have very significant network information that tells us who's working with who, who's reviewing, which code, which repository is more important than others, etc. So there's even some market data, which essentially can tell us, for example, if any of these repositories belongs to a specific company, if it's used by a million or more computers, et cetera.

So there's just a lot of market data, a lot of little breadcrumbs of information that, when put together, can give us a very good representation of how good someone is for writing code or for developing software. So how can this data be useful? And this is essentially where I will tell you a bit more about Quine. So at Quine, as I said at the beginning, we analyze open source data to quantify the experience of software developers and essentially quantify their reputation within open source. and their ability or the probability that they will be able to solve any specific issue in open source, right?

Quine's Role in Shaping Developer Careers

So we use generative AI to create embeddings of software developers from the data they generate in open source and embeddings of issues on GitHub and use that to essentially find the best contributors for any GitHub issue in the ecosystem. One of the ways in which you can already use this is by signing up to Quine, you can essentially see your ranking. And this ranking, I mean, we call it DevRank. And what this does is it's a network centrality algorithm, similar to PageRank.

that essentially measures your importance in open source, given your contributions and essentially the social proof that the repositories where you have contributed to have gotten, right? We also help you, for example, like certify your developer experience using real data. For example, we created these GitHub widgets that are like these embeddable assets in your GitHub profile in which we automatically detect, for example, which packages or libraries you have using your code. and we can give context that you have actually used Torch or you have actually used NumPy or, you know, like, YouTube, et cetera, in a beautiful visualization that can just help you, like, brag that you actually use Torch in your code or whatever, right? And most importantly, we help you build verified specialisms.

I said at the beginning of the presentation that for software developers, the alpha and the value is in the scarcity. So what we do at Coin as well is sort of like offer you for free an index of open source repositories with open issues that you can browse, you can explore, given your interest, given your knowledge, and you can use to actually build this open source profile and this very strong pull request portfolio. Right? So, I mean, yeah, like, I mean, this is how it looks like. This is essentially, it used to be a recommendation system. Now it's just like an indexer that basically people can use to like explore which issues there are, you know, like what is the likelihood of their pull request being merged, what type of issues are open, et cetera.

And with that, essentially, we can basically help any developer build a unique specialism and really achieve, essentially, a differentiability in open source. This, for example, is a map we created of the open source ecosystem based on a proprietary taxonomy of open source. So, essentially, We categorize open source into 600 topics. And using generative AI and machine learning, we can place any particular repository on GitHub into one or more of these categories. So yeah, this is essentially to illustrate how there is really room for everyone. If you want to become scarce, if you want to become special, if you want your skills to speak for you, there's really a lot you can specialize on. And here in this new market, what matters is the specialization. It's essentially being able to go deep into one specific topic, go deep into one specific specialism. Time series, MongoDB, Azure, Spark, PubSub, system design, MLOps, code standards, you name it. There are open source repositories for everyone. There are communities for all tastes and shapes and forms. So it just depends on you actually wanting to take on this journey of creating a public profile that speaks for yourself. And with that, essentially, what can happen is that everyone can become different. Essentially, each individual can offer something unique to the market and can monetize their skill sets at the optimal market rate.

Collaboration Between Developers and Companies

So now, how do we work with companies, for example? Right now, we're helping companies outsource issues to the developer community using machine learning.

So again, for any specific GitHub issue, we can essentially rank our community based on their ability to solve that particular issue, and then allocate or place any member of the community to solve this issue, right? And how it works is essentially a company here will now create an issue, but now it will also tag it with a reward. We will essentially understand this reward and then we will find the best person available to be able to solve this issue.

So we create these managed touchpoints and interactions between communities and developers and help developers build specialisms while also making money, while also helping companies say like get some specific piece of work done, while also helping the company like build their community, get visibility, et cetera.

We're working right now primarily with AI and data companies. We've realized that these are the companies that right now are of course like suffering the pressure of the market. It's already very difficult to find AI or data full-time employees, it's even more difficult to find contributors. So like a lot of the current clients we're working with right now are in the AI and data domain.

And we recently as well launched a product in which we use the same technology for matching developers to issues. to essentially match projects and repositories to potential users, right? So here, for example, a company or a community will come to us And they'll be like, we want to essentially reward people that use our developer tool in very, very creative ways. They will set the reward. And then essentially, some contributors will be up for the challenge, will be building apps or projects using this particular developer tool. And then essentially, they will win a reward depending on what the community thinks is coolest, right?

And yeah, this is now live. We're also working with some developer tools companies, organizing these decentralized 24-7 type of hackathons that last a few weeks. And essentially, here, people just submit projects that are using that particular developer tool. And again, it's a great way for developers to actually build something, get feedback from the community, and making money in the process, and for companies to acquire new users, get feedback as well, grow the community, et cetera.

Conclusion

So yeah, so welcome to the future of developer work. These are our Twitter X type of accounts in case you want to connect. And yeah, thank you for...