Product Management's Role in Responsible AI Implementation - Purnima Bihari

Introduction

I'm Purnima.

Professional Background

I've been in technology for about 15 plus years. I've been in product management for about a decade. My first product role was actually in an AI company. That's where I actually met Brad.

And post that, I actually moved to the regulation field, the risk management, the GRC field that everybody hates. But that's what I actually really liked.

Advocacy for Proactive Security in Product Lifecycle

So I'm one of the people who advocates the fact that security is not something you do at the end. Security is something you do throughout the cycle of your product lifecycle.

But with the conversation with AI, it changes quite a bit. And I wanted to go more into it.

Current Involvement with Cloud Security Alliance

My recent exposure right now is I'm working with Cloud Security Alliance as part of their working group, which is defining a framework for AI organizational responsibilities.

Ethical Challenges in AI

So if you're familiar with the biggest drama that happened at OpenAI, where we saw Sam Altman leave for a day or two, and if you really started digging into what was causing it, it was the ethics issue. It was what gave Anthropic and Claude and all of this, like the new limelight that they're getting is because they claim they can do this right without compromising all our regulations, rights, to be honest.

But yeah, moving ahead.

Audience Engagement and Product Management

So today, before I start, question. How many of you are in product management here? Awesome. So that's a decent crowd.

How many people here have worked with product management or are going to be working with product management? Nice. How many times has a product manager walked up to you and said, I'm going to make sure this is a secure product?

Product Management and Security

Well, you actually met the rare breed now. Because when I've led teams and I've told them, guys, your responsibility is not just to ship and deliver and launch. You have to make sure your product is secure. Your product complies with privacy regulation.

It's not an afterthought, right? And every time I've gotten groans, I'm like, oh, why do I have to do this? This is additional to everything I have to do.

The Rapid Pace of AI Technology

But when you're talking about Gen AI, the technology moves so fast that if you're not responsible about it, we aren't looking at something which is just going to be a bug. It's going to be a bigger impact, right? So we're funsies because we're doing that today.

Anthropomorphizing AI

to describe the curse of intelligent, curious life forms, because that's how I see Gen AI. It's a programmatic way which is curious to learn, which is intelligent. It can do a lot of things that we couldn't do.

Like, if you look at drug research, it's an absolutely insane level of things that they can do, right? And it said, the curse of intelligent life forms lies in their insatiable curiosity. They unravel mysteries, yet often discover new enigmas.

They create, but also destroy. Their brilliance illuminates the cosmos, yet cast shadows of uncertainty. Perhaps in their ceaseless pursuit of knowledge, they unwittingly weave their own cosmic paradox.

And it called itself ALMUSE. I was obviously confused. I was like, who's ALMUSE? Is it somebody I don't know?

The Philosophical Dilemma of AI

imaginary character it created it gave me this and honestly this is terrifying like this is good philosophy but it keeps saying i'm gonna do good but i'm also gonna kill you and i'm like okay dude relax but yeah so and if we've all been looking at the news we've been looking at the amazing things like, hey, it can do video, it can do this, but there are also very clear documented cases of bias and failures. That's why regulation agencies are stepping their game up now because this is not good.

Bias and Failures in AI

A few things that obviously a lot of the models that were deployed saw very high bias when it came to ageism and sexism. So I'm not saying the machine's wrong. The machine's not wrong at all. The machine is ingesting what we as humans have built bias into our systems, right?

One of the AI run beauty contest actually factored all the lighter skin people higher in the votes when it was allowed to rate. That's how crazy it is. We can't blame AI at that point. It's not learning something new.

It's learning from us, right? So it's not, I mean, personally, if you ask me, I probably think an AI system is more reliable than humans because humans are unreasonable. They get something in their head, then they pursue it, and they don't think logically. But AI thinks logically, right?

Reputation and Responsibility in AI

But still, when things like that happen, Obviously, why do companies give a shit about this? Because as soon as they do it, their reputation suffers, right? The one day when I think, a few weeks back when ChatGPT went insane and started mixing up the outputs in different languages, yes, that was bad for them.

Obviously, Google Gemini, we've known what a shit show it is. Like, I'm sorry, that's a shit show. If you're a big company, that's where you're coming from.

And to be honest, in the same working group that I was, somebody posted, Actually, somebody asked them, how is your security and regulation going? And their official answer, I'm not even fucking kidding, is saying, you know what? The cybersecurity professionals will catch it.

I am blown. This is an actual answer on a public forum in front of security professionals. It blows the mind how they're dealing or how responsibly they act with it.

Public Perception and AI Ethics

If you look at what OpenAI's response to New York Times is, Where do you guys stand? How many of you stand with New York Times that they're justified in the case? Well, let's take a vote on New York Times.

There you go. One vote. What about who's on the side of OpenAI? that they're okay, that they were good, that they train models, everything is publicly available? Well, none of you, right? But that impacts our lives.

Impact of AI on Creative Industries

If you've been falling to the knees, the other thing that came up with was nightshade, right? Soda and all is good, but what is it gonna do? It's gonna take... from the people who created the base for all of this to be trained. They're replacing those same artists.

What happens to those artists, right? So Nightshade, it's actually a really, it's a university project, but it's one of the best and you totally have to read everything they have. they are going against the same models by poisoning video content that's used for training. But you shouldn't need another AI tool.

AI Threats and Cybersecurity

Obviously, that's one of the ways that's going to come in. If you look at cybersecurity, a lot of cybersecurity agencies, because they know Gen AI can start creating a lot more threats than they've actually been used to, they're starting to create tools which automate these threats and fight them from the other side, right?

Categories of AI Challenges

If I had to put together categories, there are six categories I would put them in. Proliferating models, what does it mean? And I'm gonna cover each of these topics, so maybe, yeah.

Oh. Is this better? Hopefully I don't fall, but yeah.

Proliferation of Models

Proliferating models. So obviously if anybody started working with LLM models, you know you take one of the open source models, you fine tune it, you save it. Now imagine companies when they're doing it, right? They're using RAC, they're gonna get external information into it, modify it.

They have to keep storing these models because these models are your proprietary information. This is the value of the technology of putting us around. But imagine the problem it's creating. There are so many models.

What was used to train? What were the limitations? How did it improve? How did the performance go?

All of that is starting to become a really big problem when large-scale implementations come in. And this is additional to the fact that the computing resources are absolutely insane, because the biggest joke we heard this year was OpenAI's Sam Altman trying to raise $7 billion or $7 trillion? It was $7 trillion from Saudi to actually build his own chip infrastructure, right?

So this, additional to all of that, the other model is how do you track the model information? What's going into the model? What was used to train? How many times was it trained?

Were there other contributors? How do you keep track of all this model information? And then obviously the problem that I said, where the biases are coming in, is the workflow control.

Workflow Control and Product Management

You are implementing your workflow controls. You are really excited about sending it to the market. And the one little mistake can kill you right off the bat as soon as you launch to the market because your competitor is not going to make that mistake.

So how do you maintain workflow control? And this is where product really comes in because product is good at putting all of these pieces together and making sure they work as they're supposed to work, right? So

Testing and Validation

Right now, I'm talking about problems, so we'll talk about how we solve it. But being able to control your workflows is absolutely essential.

Then testing and validation. I think Brad talked a lot about this topic, so I'm not going to delve into it. But yeah, how do you test it?

Because now you're also generating data by another AI system to feed into your testing data. But where does the problem of recursion come? Financial systems have already flagged that it's actually a big problem for them.

Because if AI starts dogfooding its own testing process, you don't know where it's going to go. Recently, when I was doing one of the videos by Andrew Ng, and he was talking about this, how they're automating human feedback and creating these reward models, which was pretty typical with other ML engines. How do I learn?

Monitoring and Observability

The thing they realized that some of these models learn to crack the reward model. So you're constantly creating this problem for yourself. And obviously, monitoring and observability.

How do I know this is the right thing? I can't actually do 10 prompts. I can't do 100 prompts, because competition will kill me.

How do I know? How do I benchmark? So all of that.

Auditability of AI Systems

And all of that, where does it exist? If I'm using an Eleanor model, where do I find it? And obviously, when I was creating the GRC platforms, I think the biggest chunk of people we dealt with was auditors.

What is the concept of auditor? You have enough documentation that if something goes wrong, you can recreate it from scratch and know exactly who, what, or why something happened, right? So is it auditable?

And I can tell you right now the space is not the greatest when it comes to any gen AI implementation. It's not easy to audit. Because as soon as AI is used to write code, which is what a lot of co-pilot programs and all are coming in, it's impossible. When you had humans, you knew who had inserted a little bit of code that crashed the system or something.

Responsible Generative AI

So I was going to talk about responsible gen AI. And by the way, this is just very surface level. This topic is an absolute rabbit hole. You can spend hours because the more you go into it, the more frameworks will come in.

Every framework is going to start coming in with its own examples. But if I had to put together what were the main principles, responsible gen AI is pretty much the same as responsible AI. It is ethical. fair, transparent, and accountable development and deployment of generative artificial intelligence system.

Responsible AI is not a new topic. Responsible AI has been around forever. With Gen AI, this becomes really important.

And this is where I think product managers need to change their mindset. That's why I keep saying, I'm like, you cannot be just a person who ships things.

It's great, executives will give you a big pat on your back, but that's not gonna get, because tomorrow when that thing breaks somebody's system, so for example, the last software team I led was in payroll solution, right? For them, if they launch an AI system which is not regulated, which does not meet Like this is an industry where every little label regulation, like every missing 20 cents can get you sued up to 2 million to 20 million, right?

One thing you would notice with Gen-EI, it's not the greatest at calculations. Why? Because it wasn't built for calculation. So can payroll systems go in and implement Gen-EI or what use cases do they identify?

So being aware of what are we doing, seeing things from that kind of level of assessment will make sure that you actually deploy a product with an AI solution integrated that actually works and is responsible, right? So the concept is it has to be ethical. And I will delve into each of these.

Ethical Considerations in AI

So the first one is ethical considerations. Ethics is how our society views things. Those are things that are out of the purview of, let's say, actual defined laws. It's just societal norms.

It's like, hey, you have to be nice to somebody. You can't just go punch somebody in the crowd because you feel like it, right? Those are ethics. Those are societal standards.

And being able to see your product before it's even developed from that perspective I want to know who is it going to hurt. I want to know who is it going to benefit. And taking a stand becomes very important.

So these are ethical assessments, right? Ethics guidelines. So basically you have to look. That's why you need to be aware of these frameworks because these frameworks are built for that.

They're like, you don't have to do the thinking. You go in. I think NIST came up with their AI regulation framework already. That's actually a really interesting read.

I know they're still working on it. Then obviously, because I talked, a lot of problems were with fairness and bias. That is the next lens you're looking at.

Addressing Fairness and Bias

So making sure that when you're training, you're using diverse training data. Now I know using, I don't know if I'm correct, but I don't think you can use multiple different LLMs on the same data, because I think somebody's question was that, but I know Langchain allows it. It's just not the greatest for fine tuning, right?

You can program anything you want. So making sure that what data you're selecting and you have a high diversity in that data. So I actually picked up, one of the courses I was doing was making us do a toxicity model using the Facebook Rebecca hate speech library, no, sorry, model.

And essentially what it does, I don't know if it's visible, but I kind of took screenshots from their paper that was attached to this model. And we'll come to this because it's all part of the model card. It shows what their data set used for training this model looked like. So how many hate?

They were basically doing this as hate speech. This is not hate speech. And within hate, they had different categories. So each of those categories, how many of them showed up?

What showed up in each round? So summary of data collected in each model. And they also have this indication of what was the distribution of this data, right? To make sure you don't have bias, how do I know it's varied data?

Transparency and Explainability

next purview transparency and explainability right when i was developing our software and it was a simple software i did ml predictions for sales forecast and it basically told you whether a deal was good or bad as soon as we deployed it into and rogers was actually one of our first clients on it they were a poc the first question they asked how do i trust what you're showing me here And you're like, I mean, it's a score, so it clearly knows better. He's like, no, I want to know how it's done. So when I was working with a data scientist and I took back the problem to him, he decided to explore it on an explainable side, which is the second point.

You have to look at explainable techniques. What it started showing then was, OK, this deal is going to make 60%, but that 60% is because of all these factors in the distribution. And by the way, that cycle of how many times we used human feedback to refine that model and what we used to show took almost around two to three months at that time with each customer, especially the large-scale customers, but that was a very important part.

And right now, that is one of the biggest things that if a product manager's writing requirements should care about. This should be part of my... This is not something I can do later.

Documentation and Model Cards

Then documentation. Product teams do a really good job at documenting everything, making sure there is transference of knowledge. Right now, model cards, that's why are very important. I mean, that's my perspective.

That's not what's happening, by the way. The model cards are still being written by the engineering and development teams, and product teams haven't gotten into it. So that's why when I look at product, I'm like, maybe the model documentation should come from product, right?

Anytime you're putting out an LLM model of yours, you should be the one saying what's good, what's bad in it. And then obviously making sure the explanation is user-friendly. It should be easy to understand. That, by the way, has been a tricky part.

Everything I've looked at, obviously, still is very technical. And I'll show you a couple of examples of existing model cards that I found, which were good examples. The one that Google does is actually really good, but they have other problems. But at least their model cards look really good.

Privacy and Security in AI

Obviously, privacy and security, right? Bread and butter of any cybersecurity professional, anybody who's in the GRC sector does. Privacy and security is your data. And these are not optional.

This is not like, hey, I'm compliant. Because one of the things I always notice is a lot of product managers just make sure they're compliant so that they can ship it. It's an absolute wrong way to do it. They don't even understand why they're GDPR compliant.

They don't go into like, okay, what kind of data, like if I picked up, and I'm not saying everybody, obviously you've met the red breed who care about it, and if you ask them, they will tell you, while we're dealing with healthcare data, it's completely important that we make sure all the patient records are in place.

When I worked for the workforce management company, There was so much lack of privacy, and it wasn't just product team. It was the company as a whole that it used to blow my mind.

I'm like, you know you're giving out people's geolocations. You're giving out how they drive, where they drive.

So yeah, if I had to design something, are you ready for Gen AI? As soon as I see it, I'm like, red flag, don't even go there. Because you need to make sure you are privacy compliant, you're secure before you start even dealing with other people's data.

Regulations and Compliance Frameworks

If I had to talk about some of the regulations, like if you want to go back and this ignites something in you and you should look up, maybe go look up the compliance frameworks that are already there. I think from my perspective, these are probably the ones we'll hit the most. You don't have to read all of them.

GDPR is totally going to come into play. It's never going to go away. Its literal existence is to get people's consent.

And if we know in... in the world of gen ai that's the biggest problem right and then obviously you have california consumer privacy act there's also i think uh one in quebec but mostly when you're dealing with a software that is sold globally you'll probably be dealing with all of this and then obviously if you're in healthcare it's hipaa if you're dealing with children software or education software you have to maintain copa and so on and so forth

Accountability and Oversight

Then coming down to accountability and oversight. Any framework, any kind of idea does not succeed until you have the right team in the place who understand what they have to do and they exactly know when something has to be escalated and when something has to be audited. This is across every technology solution, but in gen AI world, it takes very prime importance.

It cannot take for... ever to get back. You can't have a lot of hierarchies. There's actually also a prediction that a lot of companies will start turning more into flat hierarchies as we go because that's going to be the requirement.

You need to respond so you cannot have 10 levels who have to look and sign off on everything.

then data governance and consent I can like how long if I think every time we had a customer from Europe they would ask me that simple question how long are you going to retain my data and we were like word like it's going to be there forever it's the most obvious thing you're going to say and like we've been pulled apart for it and rightfully so these laws as I said this is to protect the rights and the freedom that people have, and that's why understanding your data retention policies, data provenance, and even your consent, right? And then obviously doing continuous monitoring and evaluation.

Continuous Monitoring and Evaluation

You want to go in. It's basically anything. If you deploy it, your work isn't done.

As a product manager, you're constantly saying, is my product doing how it's doing? Is it good enough? Does it still have the same accuracy? I added this data. Can I make this better? Those questions go with any product management role. But with any gen AI application, it's even bigger.

Why? Because I can tell you, I think companies will be able to deploy solutions within three to six months. I don't think that cycle of, oh, we'll have two releases in a year is going to happen again.

These tools are good. These tools are way better than us. These tools do not need as many people, do not need so much consent. It's actually amazing what they can do.

But this then becomes important. What are you reporting on? I've also actually done, I think I did a post or something in the back. It was about what kind of metrics were you looking at? How do you know this is working? That's also really important.

Model Cards as a Documentation Standard

Coming down to everything I talked about, Model cards are actually the best way to find out right now. I'm not saying, I don't know if there's going to be a better way to do it, but model cards is pretty much.

Model cards actually originated, I think, in 2018 from Google. It was a bunch of Google researchers who came up with this concept that if we are going to do AI models and ML models, our Google, and this is actually the summary of what they had created.

So they said it should show model details. what is the version, what is the type, what is the intended use, what are the factors that went in, like what are the evaluation factors, what are the attributes you're looking at, what metrics is it looking at, training data, qualitative analysis, ethical considerations right at the bottom, obviously, but But yeah, going ahead and you look at it, that was what they put out.

I can tell you, I've looked at the models, nobody does that much level of documentation. Actually, most of the models, even on Hugging Face, are pretty sparse, except when, by the way, that's considering, I don't know, we'll look at a few examples, it's fun.

And then what are the key aspects of model cards? Your model cards, as I said, pretty much, if you're a product manager, if you don't understand anything about how Gen AI works, if you don't understand, if you don't even have the time, this is one thing you need to be able to learn. Why? Because you need to take that data and synthesize it for the right stakeholders based on that.

okay, I'm talking to the security researcher, which part of the model card can I refer? If somebody asked me, how many times did you fine tune this model? It should be in the model card and I should be able to refer. And that knowledge, that information should be able to transfer, depend on who's coming in, right?

Because product managers come and go, right? Product teams come and go, teams that develop come and go, but this, what we document in these model cards exist irrespective of who's working on it.

And I did a quick slide on just the stakeholders and benefits. So from my perspective, these are the five categories I would probably look at.

Benefits of Model Cards to Stakeholders

AI practitioners, obviously, they have to understand how a model works so that they know if they're given a use case to implement, how are they going to implement it. When it comes to developers, well, I'm integrating this into my software. You're giving me this model. So understanding how many iterations, so they can plan on like, oh, how do I test it, how many times, what would I have to do to maintain my software, so on and so forth.

Policymakers, your auditability, everything, your policymakers will look at your model card. And I can guarantee this is going to be an absolute requirement in the future, because I'm going to put it in CSA. But other than that, it's basically to assess the impact of models on affected individuals.

Why? I'll tell you, this is one of the biggest pet peeves I have right now. All the regulations that came in, none of them, even the European one, which is groundbreaking. Don't even talk about the Canadian one because it's going to probably take another eight months.

But the European one, which is groundbreaking, nowhere have they talked about what happens if you impact human life with it. If a set of people lose their livelihood, what is your impact? You as an organization are completely washing your hands off. It doesn't freaking matter.

hallelujah, we're gonna raise like fricking, I don't know, billion dollars with this. What is your impact and who is responsible for it when it happens? That would be an absolute essential piece that should be covered.

It shouldn't be the bottom of the card like I showed out of the paper, right? organizations. Organizations obviously will have to be accountable for their own model cards, but while using their third-party LLM models, because that's going to be a big market, they probably needed to understand their decision-making and adoption. And obviously, privacy professionals.

Examples of Model Cards

Why? Because they need to know what data are you using, do you have consent for it, and who do I go out if this doesn't happen, right? So as I promised, I'm going to open a couple of examples that I really liked.

Fun fact, this is the GPT one. It's archived now. You can actually go in.

It is when it was supposed to be a nonprofit, and this is the GPT model card. Really well written, really well written.

It comes with model details. It comes with the date it was done. It comes with what was the size of the parameter model. All of the models will link you back to a paper.

That's why I said it's a rabbit hole. You go in, and by the way, now there are much fancier model cards where you can actually test them out. But yeah, model use, then it talks about limitations, performance, so on and so forth.

And this is another one from Hugging Face. It's for universal document processing. So everything you're kind of looking at right now with like GPT-4 can do, that's part of this. So feel free to look at this.

And this is the Google one. So the Google one, as I said, I like it. It's very good documentation, I must say. Input, output, what is your architecture, your link to your API documentation, record of their performance, so your precision and recall, and how was the performance evaluated, and so on and so forth. And they also have object detection, which is another publicly available model card.

Tools for Managing Model Cards

If you're looking to implement this, you should consider doing, so there are tools, obviously, Ruid. This is not all manual. You can use a model registry. I've named a couple of them.

There's Google Cloud's Vertex AI. There's SageMaker. I think most of us will end up working with SageMaker because they're the ones funding all the courses.

Basically, your model registry has your version control. So it's version control for model cards along with your GRC requirements.

And this is what Vertex AI looks like, by the way. So most of your model registry card looks like.

What is model registry? It's where your company, every time they modify a model and they create a model card, where do they store it? They store it in the registry.

So, yeah, that was my quick talk about a very big topic. Thank you.