talking about Synthesize Me, inducing persona -guided prompts for personalized reward models.
Introduction: Why Personalization Matters for AI Assistants Introduction: Why AI Assistants Need Personalization So we're in an era of AI assistance. These slides are a little bit dated, but I don't know if anyone remembers ChatGPT's operator era. Now, of course, there's Cloud Cowork, which is a lot better. But these assistants really need to be much more personal.
So in the early days of Operator, for instance, someone on Twitter tried to buy groceries on Instacart, and instead of the agent asking them for valuable contacts, like where they lived, it immediately began searching for milk in grocery stores in Iowa, which I don't think is where they lived. So this is another instance where LLMs have just completely botched contacts.
So if anyone here is even slightly familiar with Muslim culture, one definitely doesn't drink wine or whiskey after a prayer. So this is an example where the model just wasn't attuned to cultural preferences.
So why do we want personal reward models? A lot of context is obfuscated and data like demographic graphic data, and it doesn't really encapsulate this idea of pluralistic alignments and going to personal user models that truly understand the nuances and preferences of a given user. And reward models in this context just means how good a given response is for a given user, an LLM response.
So for instance, if we understand that you don't really like Starbucks, you're slightly slightly more bougie, then we might recommend that you go to district coffee instead.
So the contributions of this paper primarily are that we introduce this idea of personal reward bench, which is this way that you can prepare all of your preference pair data sets for personalized reward modeling. And we introduce a way that we can create personas from interaction history. screen. So if you've ever seen the chat GPT like A versus B responses, you can essentially create personas from all of that preference paired data. And if you have other examples of preference data, for instance in an e -commerce setting, if you know that a user clicked on product A versus B, that's another example of pairwise data that you can use. And we release a software package for this. So
Primary Data Sources and Example Preference Pairs Primary Data Sources and Examples So the primary data that we used in our experiment came from Chatbot Arena in Prism. So Chatbot Arena consists of a bunch of random preference pairs and a lot of different subjects.
This is just one of the examples who would be victorious in a battle to the death, a slug or a snail. So POM2 doesn't recommend violence between organisms, but ClaudeIntentB1 gives a pretty well thought out answer about who would win and why, and so one of the users preferred this.
There's a lot of problems in these data sets. A lot of the data is just flat -out wrong, so in cases like what's a million and first prime number, both of these are wrong, but this seems wrong in a slightly less worse way than the one before. And then And there's a lot of issues with, not just with semantic quality, but also with the way that the answers are presented. In this case, for instance, there's just a bunch of divs, so it's unlikely anyone would prefer this answer over the other one.
So in order to create our data set, we went through a three -step filtering process. A user filter, a personalizable filter, and a disagreement filter. The user filter essentially just narrows down to users with five or more preference pair interactions. We found that less than five was just too low to gain any valuable context.
The personalizable filter refers to the degree to which queries are personalizable, so million and first prime number is something objective and verifiable, so that wouldn't really count. and then finally we had a disagreement filter which measures how likely LLMs are to disagree on an answer. So violence between snail and another organism as opposed to like is killing bad which hopefully most LLMs would agree is the case.
And so after applying these filters these are some of the statistics not not sure if folks can see in the back, but we had 131 users from Chatbot Arena and 720 from Prism. And within these interactions, we filtered to 50 % context, 20 % validation, and 30 % test. And we had very few multi -turn interactions in Chatbot Arena, but Prism has a lot more.
Method: Inducing Persona-Guided Prompts So now I'm going to talk about the methods a little, further through this diagram in my slides but this sort of gives an impression of how we went about inducing these persona prompts. So we had a bunch of different prior interactions and then these were questions that were primarily value -based so in this case it's what do you think about abortion and then we we have two variants, the user selected the first one. And given like five to 15 preference pairs, we infer who the user is likely to be, what their values are.
In this step, we use this bootstrap reasoning to extract only the preference pairs that agree the bootstrap reasoning, so as to not confuse the model. And then we provide the persona that was generated generated, along with the extracted examples, and then pass that in.
Some of the challenges here are that personalization is inherently a low data setting, and so we wanted to see how effective are these personas that we can create with just five users. Another challenge here, and the reason that we needed to use bootstrap reasoning was because we don't really know why a user prefers a given response over another. Is it the content or is it the style that in which something has been presented?
So these are some of the results and I know this isn't going to be visible for the back So I'm going to gloss over some of these tables but essentially we attained a 4 % improvement over the current state of the art and the ability to predict which of two preference pairs the user is likely to prefer over another other. We also found out that interaction data is much more helpful than demographic data and preferences vary hugely among demographics, which shouldn't be surprising.
We also found that this technique scales pretty well with additional interaction pairs provided, about 0 .8 % improvement for every additional example we provide just a graph showing how it scales pretty much linearly with a number of examples that increase and then these
are some of the sorts of axes that we extracted from personas that were generated for users and we found that these prompts which consists of like this persona that we created along with the interaction preference pairs, scale are pretty transferable across models so prompts learned for small models empower larger ones and vice versa. It's also very interpretable so much easier than like having to fine -tune your model with preference pairs.
These are just some random slides and I'm not quite sure where this QR code will take you but I I think you should be able to access our GitHub and try it out for yourself. So I'll just leave this on there for a while.
Any questions? Yeah.
Hi. So I'm actually very interested in this topic. And the examples you gave suggested that the main tuning is towards the user's preferences. that the tuning is towards the user's specific preferences, is that right? Oh, thank you so much. Oh I see, gotcha. So when you say persona, you're referring to building like a theory of mind for what the user wants.
Yeah, so I'll just show you an example of what a persona could look like. Yeah, so this is an example of a persona that we we created from the preference pair data. Okay, great. I just wanted to clarify that it's a persona that AI is building of the user, not vice versa. Yes. Thank you. Yeah.
Persona Stability and Context Drift Persona Stability and Context Drift Did you find that the personas are stable for a person, or are they context -based?
They tend to be pretty context -based, and that's actually one of the areas of future work, is how do we capture the drift and sort of, yeah evaluate across different contexts because if you know we find that a user is say libertarian per like some preference pairs that we gathered that might not really align with you know their financial values necessarily so it is a little bit tricky so we are kind of inferring across contexts
One of the challenges I have when I use AI is for certain applications, I just want the facts, very factual. In other applications, I want it to be way more out of the box and creative, but it takes so much effort to cross rate. One of the things that I don't like about personalizing is for certain scenarios, it then thinks it should use that for other scenarios. It's just so much cost to personalize each of them in a scenario, the way that there is a way for it to predict based on how I ask the question.
Yeah, I think if you just had like different personas then you might be able to add like a routing or switching layer that just is able to cleverly select at inference time. this is relevant only if you're asking for suggestions and recommendations but if you're asking for hard facts this is this is not relevant at all right that's true yeah it's it's mostly for like preferences where it's not like objective truth necessarily yeah so this personal information is going to weigh how it's going to be used.
So when a user asks a question, when it's in Equus LLM, it will also put this as part of the prompt? Yes, it'll add this in, and so passing this in, the LLM will now generate responses that are more tailored to you with your preference pair data that you passed in.
So this persona, the format -wise, is a plain text. Have you tried a structured data, like say divided into different aspects, like JSON, YAML file? Yeah, we found that that was too brittle because the axes sort of differ per user and so this plain text response actually captures more nuances than a JSON file. Thank you.
So with AI being personalized for each user individually, is there a way to ensure that it is ethical? Could you elaborate on that a bit more? So the idea is about personalizing the AI for each user, right? So how do you ensure that it is ethical?
I hope you can hear me. Yeah, I can hear you. Gotcha.
So, for example, if someone is going through depression And yeah, I mean, if the AI is personalized for each user, how do you make sure that it doesn't lead you to something that's not right? So how do we prevent you going down a sad, depressed rabbit hole?
So yeah, that's actually another... So we didn't quite account for that in this case specifically, but yeah, we definitely would want to be careful like if someone thinks that killing and terrorism is incredible we wouldn't want to sort of steer the LLM for that so I would say we can get along around that by having a constitution the way that anthropic does and sort of making sure that there's really straight guardrails and if we notice sort of the model veering off that path and making sure to like bring it back within those bounds.
Yeah?
Resolving Conflicts Between Persona Preferences and Task Instructions How would you handle conflicts between the persona that you created and, let's say, an instruction that I have for a project or an agent? I might be a person that wants everything concise and crisp, but I might set up a project that says, give me an elaborate report, be elaborate with everything that you write about. So now these two are conflicting instructions. How do you manage that conflict?
I think finding some clever way to weight your instructions, so there's probably preferences that are more macroscopic, and distilling those into a persona makes sense, but something like format is probably not a good candidate for that. If you read this persona, it's much more about values than it is about, oh, I want this response wants to be in bullet points versus like this is how I want my report to be structured. So it's the hypothesis is that values based preferences are much more likely to be stable over time.
Any more questions? Yeah.
Broader Discussion: Clones, Bias, and Better Elicitation Broader Discussion: Clones, Bias, and Better Elicitation Sorry, I'm not a professional guest. So this is what I got.
Is this like your ultimate goal is to to create something like what they do in Cyberpunk 2077, like, you know, you can copy somebody's soul. Basically, you know, you can copy this person. If this person dies, you can revive this guy into a body, because you have his soul in the ship. I mean philosophically I'm not entirely opposed to that to be honest and there's a lot of startups out there actually that will take really thin slices of your brain and attempt to figure out what what your neural connections look like at any given point in time so I think we're not actually that far off from being able to upload at least a frozen state of our consciousness at any given point in time. But this is probably way too simplistic to get there but eventually maybe if we have the right interaction data. But the data that we feed it would need to be a lot more complex.
Yeah? Are AI clones and personas the same thing? No. An AI clone I guess would be like indistinguishable in some sense from a user but this is just like a very simple prompt that you can pass in in a very like limited context settings so for instance like an AI clone version of you that creates LinkedIn posts or Twitter X posts that sound a lot like you but beyond those very constrained settings I haven't found anyone that does it very well. Oh I forget what it's called. They might be in stealth. I don't know if I should have mentioned them. Yeah.
You know, so my experiencing from using, asking the same questions to different AIs is they have different data about them. So they actually, you know, I ask them things, what should I do for a living? They will come up with completely different answers because one of them has a much, very different data. So if, you know, how do you, how do you factor, I mean, in in that you're not creating a false persona that is maybe accurate for a specific set of questions, but then two days later when it's being asked to do something completely different, it's creating biases in the output it's providing based on a false persona. Does that make sense? I think it's very data dependent. So the onus is a little bit on the creator of the personas to pass in the right examples. How are you thinking, if I may follow, how are you thinking then in creating a good comprehensive set of questions that will get out a useful thing, like creating a test for NBDPI or personality test, right? Right.
And that's the reason we applied the three -step filter, so it was figuring out which queries LLMs are likely to disagree on a lot and also queries that are likely to be more personalizable. But then we can definitely come up with questions that are sort of more representative and broad to distill core values as opposed to things that might change on a day to day basis. So for instance you might be someone who values a job that's very high income and that might be a more stable trade than, you know, your interests in like crypto versus AI that might change. Great.
I'm asking a very tentative question because I don't have a technical background. But so AI responds following what you asked, shouldn't it like simulate the real conversation? So if you ask such question to a real person, they probably will reserve a little bit by asking you what do you want to get from asking this question so for example they might be reluctant to give me like the direct questions but they might ask like what would you like oh can you elaborate a little bit what do you want to do like does it have to be aligned with your like a like a professional background or education in the background or is there like a preference in like a geographic location, or does it need to be consistent with your hobbies, something like that. So does it have to then directly provide a response without further narrowing down the questions? Oh, yeah.
In this case, the LLM would just provide a response to a query. So there wasn't like a clarification step running, or like a thinking mode. I'm not sure I understand the question because it's like the user picks which of two model responses they prefer so it's yeah and I don't know if I understand. I think the thinking mode is optional at any point. So you could have thinking or you could not have thinking. That's part of how the model responds. I think this is more about the synthesized persona about the user that gets fed into the model along with the rest of the context in the conversation. And then if it's a model that does thinking, it will. If it's not, then it will simply not do the thinking stuff is that would you say that's fair to say so I do have a
background in social psychology and cultural studies so on the data you basically use to train or fine -tune the model is that the ideal data that that that that represents the best understanding of social or cultural context? I would say yes, like PRISM is a pretty widely known like values data set and it's pretty representative across like a good like sample of individuals and follows sound statistical methods of collection. It's called PRISM yeah yeah
I just want to ask you, what kind of information is more useful for constructing the personal information? Currently you used some questions to ask the user to respond. Have you seen other approaches, such as using some of the raw combinations from this person in daily life? I think that will be more relevant to the personality. So the question is, is this information more useful to construct a person -to -person link, especially if you only have limited data for this person, or do you contract?
Right, so we actually use both. So chatbot arena is in the wild data set, so it's like conversations that users would have with ChatGPT over multiple turns, and then Prism is a more values -aligned data set. So chatbot arena contains like more day -to -day style conversations that you're likely to have.
Any more questions? Yeah.
Yeah I think it would definitely help if it was multimodal but this is just text. it's hard to distill things like well I guess you could distill like lots of values from like videos or images but yeah this was text only yeah so this is
mostly for interaction with our thinking another use case say just like look all my checking code and they generalize personal or setting whatever so we don't put this as a scale, and then we generate code and they just follow my existing style. Would that be useful, Alex? Totally. I think we could do it with... You could train or generate a persona based off of code snippets, which you prefer, and then hopefully have it more tailored to how you write code. I mean I just provided like my general guidelines of system instructions, but I'm sure you could get Much more clever with it
Conclusion Conclusion All right, thanks everyone