When AI runs the interview

Introduction

Very nice to be here today. Thanks for the invitation.

The title is a little bit provocative, but I want to walk you through a project very dear to mine that I worked while I was at Maze, a company that is on the space of user research.

I'll explain a little bit better on it. It's a current product that is used by many companies to do AI research at scale.

Background: From Engineering Management to User Research

That being told, I want to just introduce a little bit myself. As was mentioned, and I'm an engineering manager been 15 years on the industry half of it I've been a software developer at the other half managing different sizes of teams

different types of companies many startups now at Amazon where I'm dealing with a whole scope of different problem problems I have two teams that I manage there one focused on search filters and navigation so whenever you're in the store and trying to put filters if it doesn't work very well you can complain to me another team that is focusing on expanding a new category worldwide so in in prescription glasses.

So in the near future, you will be able to go to Amazon and actually put your prescription in and get your prescription glasses at your house by the normal Amazon flow that you have.

I'm explaining a little bit more of that in detail because we'll use this example to illustrate the power of having an AI help you conduct user research.

I wanna also give a disclaimer. I typically do this presentation during 45 minutes. I'm trying to condense it as super much as I can. So any topic here, I can go into super detail.

So afterwards, just grab me to a conversation. I'll be very high level on the things that I will show you just to give you a perception of what this project is and what capabilities that we've built there.

What Maze Does and Why User Research Matters

So Maze, really interesting company, a startup.

If you don't know what user research is, I can convince you later today on how important it is and how should it be baked on all the processes that you have of any product that you're building.

And what maze tries to do is to empower any company any subject to actually do this kind of research at scale to be Having to have confidence on the things that you're building because you have user Intent and ideas before you start even building while you're building and even after you're building for that

It actually provides you multiple tools Okay from surveys typically like for quantitative kind of research Prototype testing where you can actually put a prototype with Figma and put in front of real people for them to test

panel offering which is a very interesting one where you can actually pick up demographics of the people that you want to interview like the persona that you want and give them access to these researchers or studies and have and collect answers for them then mobile testing and web testing like real

Moderated Interviews: Value and Common Pain Points

live testing that you can do and finally moderated interviews which is what we'll be focusing today and explain them in a nutshell is very simple it's a It's a conversation, right, that typically is moderated by an interviewer, typically a researcher that has already some proficiency trying to understand ideas and motivations from a specific person about a specific problem.

This kind of study has a lot of flaws, okay, you have them listed there, but it has also very big cons, like it's very time consuming, imagine doing research with, I don't know, ten people, you have to conduct the interview itself, but before that there's the scheduling, there's the aftermath of the analysis there's a lot of time baked in for you to actually get these actually good nuggets of information of that this is I call it and we call it waha moments like this is exactly what the customer needs

for me to build and another big disadvantage is it can be biased all right and it can be biased for many reasons like it can be confirmation bias because we are looking for the exact thing that we already built can be confirm can be also geographical biases languages a big barrier and it's also part of the skill right if you're talking with somebody that is from Germany in English maybe they're not as able to actually talk to you in English and you will be biased by just what you can capture and the analysis of all of

this imagining having to go through the transcripts of 10 interviews and trying to categorize findings across of all of it the time consumption of this is high high high and again the interview skills really is a bottleneck here if you don't have the right expertise you won't explore the surface of the problem as

Bringing AI Into Moderated Interviews

properly based on these cons what we tried was to mitigate them by introducing AI to help conduct these interviews so if you think about it then it really can mitigate a lot of those things it has other problems that I'll

surface at the end but to better illustrate how we work I'll do a small small demo leaning on my exact example.

Demo Scenario: Amazon Optics in Germany

One of my teams, as I mentioned, is actually building. It's live now with Amazon Optics. We started on the German market space, and the intention is to scale it worldwide after we get some good signals that the UX and UI is actually working.

But before that, it's good that you actually do some user interviews, so you do understand what will be the perception of this audience to your product or what you're trying to build.

So I went into Maze, I set up a study, I'll show you very briefly how that went, and then I targeted people that have prescription glasses needs, so users glasses or contact lenses, and in the purchase age gap, which is typically the kind of people that we will target to use Amazon.

So let me try to see if I can show you the videos, bear with me for a second can we see if we have sound thank you and I'm gonna

Setting Up an AI‑Moderated Study in Maze

select hey so this is a regular project maze and here you can create different study types and I'm gonna select the AI moderated study and I'm gonna walk you through what it shows so here you can actually select two types of structure to create this a I'm already study and the one we will be focusing on is the

kind of setup, but you can also choose the custom if you already have your research goals and questions quite formalized and you have quite a good research plan to act upon. The language settings you can also set here, which is also valuable, but what I want to

focus here is the concept that we went through was to empower the user to actually construct the best context around the problem that they are surfacing to their client or to the person that they're interviewing in this case and so a lot of we need to handhold our users to actually know what kind of context you need to provide to the model for it to be able to perform the best

and that's why we have specific templates many of them that will adjust the prompts and adjust the way you write and what directions you give in the macro level when you're describing the study based on what you want to achieve for instance on problem solving you have many different use cases like problem discovery user workflow mapping decision during mapping and so on

on problem validation you have many of them as well problem validation solution prioritization or price sensitivity exploration these are very specific examples that really are different in terms of engagement of conversation that you can have finally the customer voice which is the one

that correlates the most to the demo that i'm doing today so we're going to analyze the amazon sensitivity to the optics market in germany so so yeah what we want is from all of these um is like a customer experience and narratives and there you go you can then use this

template that is placed hold here we can put Germany and what their prescription glasses purchases experiences online and there we go this already gives me a placeholder the intention here is that we empower them so they actually put a lot of more nuance to this to actually understand a little bit what to go more

closer to what they want to achieve but then we also need to help them there's other things that actually pay our world and actually having good context is knowing about the company knowing where you're coming from when you're doing this interview so here I would do Amazon we also built import AI from search that

can can actually put your site and it's actually scraps for it so it's a little bit more helpful then you need to Skype if you have a product or feature they want to do here we can describe what's the current ecosystem of the online experience that amazon provides in terms of the prescription glasses um as a whole as a whole and assumptions and

hypothesis or open questions that you have that can actually do that the final step of this guided journey is actually ai here is generating your research goals so the things that the like the kind of questions that you want to answer by the end of of your interview cycles and there you go it actually sets a lot of different things you can add and other ones i think one here and

for the sake of time i'm not going to explore the rest it's literally about empowering the user the researcher to provide as much context as possible so the interview can be a success and defining goals so that at the end you can frame them and get some qualitative signals

around them and this was the setup I want to show you next how the AI actually behaves so after this sorry there you go bear with me as well I don't know why the videos go always on mute it's just for tricking me but so

What the AI Interview Experience Looks Like

imagine once I put it live this would be what the person that I'm wanting to interview would see it's me myself again I didn't use any other images I'm I'm pretending I'm lying about having glasses and actually purchasing them online. But this will give you a taste of how this agent actually behaves while interviewing.

Eric, thanks for joining this conversation. Today, we'll be exploring your thoughts and experiences around buying prescription glasses online, particularly in the context of how people in Germany approach this process. Are you ready to get started?

Yes, I am. Great. Let's begin. I appreciate you taking the time to share your insights with me today.

To start, could you tell me a bit about the last time you bought prescription glasses? It was about a month ago. Where did you purchase those glasses, in a physical store or online? In a physical store. What made you choose to buy your glasses in a physical store rather than online?

Just thought it would be better. Like, I like to cry on the glasses. also have a little bit of sense of how they fit my face and online that's very difficult and the risk is high like typically these set of glasses are expensive when you put the lenses and if you buy good frames like it's quite an expensive purchase and yeah the risk is quite high it goes it actually explores

a little bit of surface it goes on the goals at the end I'll have a QR code code that you can actually try it, pretend as well or not, that you're people that purchase glasses and have that interactive conversation with them. You can also have it in different languages.

It's a multitude. I think it's 16 the last time I checked it. Maybe the team already improved that. But you can even try that on your mother tongue, how it goes.

But for the sake of time, again, I'll be going for the next one, which is the analysis, so the last part.

Analysis Workflow: Transcripts, Highlights, and Themes

And this was actually pretty interesting, because I used the panel offering that I mentioned so I used maze to target exactly this demographic Germany folks that are with prescription glasses and I got overnight this was at the Friday I had the talk sorry at first they had the

talk on Friday and and I got about 10 answers in during the night so without having to spend any significant hour I got this we also have you'll see there's as an analysis, we do the transcripts automatically from the videos, we highlight all of the automatically

some highlights and thematics that you can then review and group and actually do your own analysis, but an automatic analysis is done. So this is...

Hey, so this is Maze platform. This is where all the sessions of all the participants that answered the moderated AI come. You probably saw me before.

for you can literally see hi there thanks for joining this conversation today we'll be exploring your thoughts and experiences around buying prescription glasses online you can literally go you can literally go and see it all you can navigate the whole transcript go to specific areas you can also see a summary of the section you can also go to the highlights that are

generated via ai and you can add themes to it like these already pre -populated teams you can You can go and actually see that specific call and so on. You can also add your own highlights. It's pretty powerful.

And then the themes are all aggregated. You can also do it automatically. But it's pretty cool. It generates really good thematics.

For instance, on this use case, the physical try -on repaints essential barrier. You can see multiple participants actually calling that out. you can go through all of them trying to explain the reasoning behind it and on

the other hand price and selection convenience drive online appeal you can also see very many anecdotes of people saying that and yeah it also creates you a nice report you can share with your stakeholders yeah and then open on

on purpose some of the people that were talking for privacy reasons. I am having now trouble to find the presentation again.

Just go. There you go. It's sitting there. I think so.

There you go. Cool. Thank you. Let's go for the slideshow.

Went through all of these and the analysis.

Strengths and Limitations of AI‑Moderated Research

So I hope this gave you a taste of what the product is, how powerful it is, and a lot of our clients have really merits on it, I'll walk through it, but there's things that we are very limited using this tool.

1Namely emotions and nuance, like I don't want to be one of those claimers that AI will substitute the world here, like I do think that human connection is a driving power that will be very difficult to replace and a skilled interviewer will be able to cut nuance, cut emotions as no model today can do.

I don't know about tomorrow end of the year who knows but it is very limited at that level you can also not detect a lot of emotion on people and guide that conversation a skilled interviewer a skilled researcher will tap into that and double down to understand feelings to explore that to understand

where to drive their product it's also awkward okay this is starting to fade away we are all interacting with these agents and feeling more and more comfortable talking to them nonetheless the less you're still talking to a robot and there's many studies to actually

make it more comfortable there's a whole I can go into a whole interest effect in how we actually train the model to feel more natural but one thing is also trying to fake humanity into these models which is very it automatically blocks people from actually being sincere on a conversation like that so

this awkwardness generates intention on actually capturing good good insights insights, and it has limited context. Yes, we have MCPs, we can build all the context in the world, but a person that is working on a company that knows maybe much more of

the tacit knowledge around still isn't replaceable by a kind of tool like this. On the other hand, time constraints are completely disappearing from this. I just told you overnight I was sleeping while I was gathering all of these insights.

Time for value, it's incredible. credible. Language barriers. Many of those interviews were conducted in German.

I have no idea what they said there. The translation was automatically done, and insights were generated automatically.

Time zone challenges. If you need to interview somebody from the United States because you're going for that market, you're going to go and do extra hours while you're using a tool like this. It actually bridges.

Another use case that is much used by our clients is actually to do a pre -screening of who to drive to dive deep and so like you can there are people that you can interview much better that expresses themselves much better so if you have kind of a tool that can actually showcase the power of how people are able to share their insights you'll definitely get

really good insights on how who to actually target and invest on those kinds of persons

How It Was Built (High Level)

so i want to really really briefly because i know we are at time almost go through the gears of how how we built this. Again, upstairs or afterwards, I can go into a lot of the details of how this was built.

I want to just highlight the core requirements that we had, and I hope it resonates with some of what you saw.

Core Product Requirements

And natural conversation flow was something mandatory. Like we could not have something that was flaky or breaking or putting the person to wait. That breaks the conversation, breaks the trust of what is happening.

So no delayed response was one of the key requirements.

Retaining context of previous answers to maintain a through conducting chord of the conversation was also a need this is not as simple as it seems today with a lot of the toolings that we have we started this project initially on the start of 2024 so the tooling at that point was

very different than what it is today even today to actually achieve this I'll share a tool a package that is very good for it but there's another nuance to this that is different models behave better than others and based on the use case you might want to choose one model versus another.

1And finally the ability to eval our conversations to maintain and improve the prompt quality. Like we

need to be sure that what the AI is behaving, how it's behaving, it's improving throughout time and we can have some control of how it evolves. So evaluations of it was a key component as well.

Implementation: Real‑Time Conversation Pipeline with Pipecat

What we used to build this was a whole story on its own, could be a technical talk of its own, but we landed on PipeCat.

PipeCat is an amazing framework, Python framework, if you're technical and you're thinking about experimenting on this realm of real -time conversations with different kind of models, this is built for speed, okay?

And some of the traits that it has, it streams everything and everywhere. It has tiny buffer windows which allows it to be fast lightweight connectors, optimized audio codecs, and LLM side improvements.

So it interacts with these models in an optimized way and it helps us shape kind of a flow like this where we can actually empower the transport and have all the speech to text because imagine on the backstage what is happening here in real time we need to

translate what the person is saying to text then to feed into a model to actually understand and get the context actually get an answer from a model and then use that text again to text to speech so we can then stream that back

it's a lot of things happening at the same time and you have different tools for each one and this is magic the magic I think from from pipe cat is that you can literally choose a cherry pick what model and what tool works best for you

for instance us at the time that this screenshot was taken from our infrastructure speech to touch to text was really good with deep Graham that we used open AI models gpd5 to actually conduct the structure and the context of of the conversation and actually to create the questions,

Claude was unbeatable in terms of the quality of the ability to actually frame these.

And then Azure to Speech was much better to actually put the enunciations and so on on the speech that we wanted to.

Cherry picking this is very important. Like everything is changing so much, being able to iterate and change different models, see which ones are working better from version to version

was essential for us to be able to do that evaluation throughout time.

Prompt Engineering, Evaluations, and Tooling

time prompt engineering was one of the core things that we had to really nail down on it

two more slides and i'm done i promise but i really want to share i'll share the slides if you want there's two articles up there so if you really want to go into deep on the evals and evaluation and have a methodology to actually do this in production at scale these the pragmatic engineer has a really nice article about how to do it and how to implement a good process

process, as well as Maze, which was written by the VP of Maze that really showcases the nuances and the care that you need to put in to do something like this.

And finally, also very good supporting us was the ToolFree tool. It helped us do iterations with it. It also helped us involve expert matter folks on the creation and maintenance of these prompts. So we had designers and PMs and researchers actually directly involved in crafting these prompts.

so also it's a supported feature flags detection on monitor of hallucinations as well and evals to maintain the consistency and trust check it out really can save you a lot of bottlenecks that you will face if you want to

Conclusion and Next Steps

produce something like this and there you go as I promised you have the moderated AI give it a try it's set up to do that use case so pretend that you wear glasses please or else you will get confused and if you want to see the full talk there's much more detail on each ones of these sections there's also

So there, it's available on YouTube. Thank you.

Finished reading?