Human Centered AI: Dynamic, Adaptive Systems for Optimal Model Performance

Hello.

Introduction

So thanks, Ben, so much for a really great talk. I'm trying to keep my talk relatively short, because I know we have pizza and drinks and time to network. But yeah, essentially, I'm going to tell you a little bit about Anode.

And it's a really exciting talk, and you're getting a sneak peek preview of something that we've been working on really hard. So I hope you'll enjoy it.

Anode: Our Company and Mission

So a little bit about our company. We started about two years ago when my co-founder Tommy and I, my hometown friend, grew up in the same hometown and have known each other for over 15 years.

Essentially, we went to a really awesome place and decided to build a data annotation interface. with a general theory of this idea called human-centered AI.

A bit about him, he's a software engineer at Google. A bit about me, I used to work at Deloitte as a data scientist in their applied AI division.

Product Launch: Panacea

And yeah, I think today, after a culmination of a lot of hard work and a lot of people contributing a lot of different things, conversation with customers, going to events like this, hearing your feedback, we're excited to launch a really new, exciting product today that we're calling Panacea.

Panacea is a novel artificial intelligence platform that we believe is doing this new theoretical thing that we called human-centered AI. And this is a theoretical talk. So today, I'll try to explain a little bit about the theory.

And over the course of the next week or so, if you're interested in learning more, we're hosting a series of webinars, tutorials, presentations, demos. And we'll kind of share that info at the end of the talk.

Understanding AI Development

So to really understand the core essence of what we're building, it's important to kind of take a blast, a look back in history to really understand how AI, at least for starters, was developed.

Model-Centric AI

So when most people think about AI today, there's this idea of model-centric AI. What is model-centric AI?

Let's say you have a fixed data set, like something you'd see in a spreadsheet, Maybe this data set where you have different fruits and sentences, and you're trying to predict fruit or vegetable. Or maybe you're trying to find this specific entity in this data set. And it's static, right? It's like a spreadsheet. It's a table.

And a lot of times, you might want to try different models. You might try GPT. You might try CLAWD. You might try LAMA. If you're doing classification, maybe you're looking at Naive Bayes or Setfit. But you're trying a bunch of these different models. And you want to see how each model you're looking at will actually compare with your output. The training data is static, and you're comparing different models.

Data-Centric AI

1About 15 or so years ago, there was this advent of AI called data-centric AI. The idea here was, in some use cases, when the models just didn't perform well enough and you only had a few days to actually solve your project, the idea is rather than kind of change the model, I'll just take our model and keep it fixed. Let's iterate on the data. Let's make the data as really good and high quality.

We can add new data, label new data, pre-process the data, make sure the data is really, really great. And as we iterate in the data and add more data, we can kind of see the impact that it would have on a specific model. So let's say in this example, the fruits, we add a new row of fruits. We added some new entities. We added data that maybe the model was messing up on. And we might want to see how the performance of this would be.

And this is really kind of the approach that people take today, either model-centric, where you're trying different models, or data-centric, where you're iterating on the data on a specific model.

Challenges with Current AI Models

And this can be used in a lot of really great use cases, such as ChatGPT or Cloud. where you can kind of interact in our own chatbot. And our team, think about six or seven months ago, we even built our own private chatbot. And we had this issue where we'd ask questions, and we'd upload our documents, and we'd have this private on-prem chatbot.

And as we uploaded more and more files, the thing that we realized is a lot of the times the answers to these questions are just totally wrong, almost every single time. And it would work well if you had two or three files. But if you got to the point where you had hundreds or thousands of files, or your questions were really complex, or the things you're trying to understand required a lot of domain specificity, these models would really not get the answers to these questions right. And we were wondering why this was the case.

Was it just that the models weren't good enough? Was it that maybe they weren't actually retrieving the right chunk or citation to actually provide the right answer? We tried experimenting with a lot of these different models. But no matter what, we noticed that there was all these hallucinations.

And these hallucinations were pretty catastrophic in certain cases. For instance, in finance, if you're kind of doing an asking question about revenue, you might not even realize it, but your model that was supposed to give an answer that was $23.7 billion might just say something like $2.37 billion. But you might not know.

Even if you did different LLM evaluation metrics, you might not really understand this. You're just kind of blindly trusting the output of these models. And furthermore, if you're trying to do something specific like, like a really specific question about a thing that you need a lot of training data to do, a lot of times the models don't really know it, because they're kind of trained on a large corpus of data like Wikipedia or Google Books, but they might not know your specific domain really well.

Introducing Human-Centered AI

So the real question is, how can we make these models better? How can we make them more accurate and more human-centered, giving you the best output for your data?

Essentially, we thought a lot about this, and our approach is essentially this four-step process that we're calling human-centered AI. Let me kind of go through the details of how this could in fact work.

The Four-Step Process

The first step is the labeling of data. Second step is training of your model. The third step is prediction. And the fourth step is evaluation.

So let's say you have a bunch of these raw documents. Maybe you have like 10 million documents. You might split your data set up into training or testing. And on testing, you might start out by trying to interact with models like GPT or Claude or Lama3.

And you might see how the results would be. And then maybe you'd kind of look and be like, hey, it's going well. I'm really happy. Or no, it's not going that well, and you'd want to figure out how to make it better.

So the first thing you might do is this evaluation process, where for each model, you're actually comparing the different results of how they'll do on this test set. And maybe you have structured labels, or maybe you have unstructured labels, and you're doing LLM eval. But now you're trying to do this, and you're saying, OK, a lot of my models for this specific use case aren't actually doing super well.

So what I might try to do here is train a model. And to do this, I might take my training data, label it, and train different models. And each of these models will have their own model ID from what we do our training job. And you can put that model ID into the testing data model and see how that model will perform as compared to your zero-shot models on your evaluation set.

And over time, you have a bunch of these different models you're trying on your test data set. And you can choose the best model for your task. And that's just the idea.

So the labeling of data, which is this four-step process, uploading, customizing, annotating, and downloading. After you label different amounts of labels, you have different versions of models that you've trained. For each kind of model, for each model type for your task, you can actually do this training process and train your large language model.

You'll take that model you've trained, and you can use that to make predictions across a variety of models that you might be looking at. And from there, once you made these predictions, you can evaluate and see how each model does on specific metrics. Simple.

And over time, the idea is as you'd label more data, you'd be able to learn from the user how to get the best model for them. When you have an annotation or important feature or a category added, you'd have a bunch of these models that would be trained in the background. Then you'd have this ensemble layer that would essentially merge these models. And you'll be able to kind of compare them versus all the zero shot and essentially be able to interact with the system that for your chatbot or your developer kit when you're calling these models, you'll just always get the best results for what you care about.

And as your requirements change and as more data is added, that will auto adapt. So that's the theory behind it.

Panacea: Practical Applications and Future Talks

And we've tried to put a lot of this into practice within an actual platform that we're calling Panacea. And there's a lot of key technological components to this. Some of it's still a work in progress. Other things we're pretty satisfied with.

But over the course of the next eight days, we'll be talking a little bit about some of these key things that we've been working on.

One, how to improve RAG to more accurately answer questions. Two, the idea where you can label data, train models, and make predictions to evaluate different performing LLMs. an SDK where you can actually take these fine-tuned and trained models or zero-shot and just easily interact with it within your own workflow. That way, you can use the best model of your end-to-end business processes.

I'll be sharing some ML research we've done on unsupervised learning. So if you want to do this training without actually needing label data on your raw documents, how you can do that with mass language models, how you can kind of do different types of fine tuning such as RLHF, private and public fine tuning, how you can do active learning to learn from people and fine tune like that. how you can kind of use this technology to do core business capabilities, but also capabilities people care about, like classifying text, extracting entities, or answering questions. And lastly, how you can take these fine-tuned models and actually put them into your own private chat bot to have your own personalized version of GPT or Cloud for you on your own data.

So yeah, we'll be having a lot of these talks. If you're interested, just feel free to reach out to me. Here's my email, or you can scan this and join our Slack channel. And we'll be sharing details then.

Conclusion

I would say this all would not be possible without a lot of the hard work that our team has went and put into this. There's a lot of people who did a lot of research, a lot of coding. And I think there's still a lot more that we can do. But hopefully, people get excited and use it.

Thank you.