A New World Of Feedback and Continuous Improvements

Introduction

So I do a lot of demos at a bunch of different of these events. As I said, I travel. We do 10 of these every month at this point, but today's going to be the first time I'm going to do this.

So I want double the criticality, please. Anything that you think can be improved in this particular demo, I really want to know about it.

Demonstrating the Superpowers of Large Language Models

I'm going to try and demo what I think are the superpowers of large language models.

The Responsibility and Challenges of Deploying Large Language Models

Now we, especially I think in this room, often there's a big responsibility in big corporates where we're thinking about what are the negative effects? What are the potential downsides to deploying large language models?

And we talk about hallucinations, talk about inaccuracies. There are real effects around bias and how do you try and temper these a little bit? There are a whole bunch of different techniques, and so how you can try and do that.

Exploring New Applications and Avoiding Common Pitfalls

But another thing is that I think there's a whole area of exploration. 1Actually, interesting enough, coaching is not too far from it, where by asking models to ask questions instead of providing answers, you're avoiding a lot of these problems.

If a question hallucinates, don't care the questions either valuable to me or not it made me think or didn't make me think but basically the the ability to ask questions the ability to play devil's advocate on something that I'm writing is never bad Maybe I discard a thing that a large language model points out. Maybe I don't. Maybe I take it into account. But I am 100% in control of what's happening here.

Innovative Uses in Learning and Development

And there is an entire field, specifically when we think about learning, that I think doesn't get enough focus, specifically when we're thinking about learning and development. And... I'm going to run you through a few ways that we use this, and I've actually never done this before. I've never done a MindStone demo, so this is not a MindStone demo overall, but I'm going to try and run you through where we use this technology and why we do it in specific ways and how we try and make a learning experience come to life.

Starting with the Basics

I'm going to start with very simple things.

A Look Inside My Mindstone Account

So here I'm on my Mindstone account. This is actually my personal account at the moment.

So I've got my dashboard. I'm on the AI Mastery program, or AI Competency program, which I'm on.

Enhancing the Learning Experience

And the first thing we do is we have every day There's a task or there's a piece of content that we run people through on the program. And we have a quick introduction, or sorry, we have an explanation as to why we think this is something that's worth spending time on today.

Then the second bit we have is a summary of the particular resource. If you don't have the time to go through it, at least make sure you run through the summary.

Both of those are generated, but then, We have a multiple choice question, a priming question that you just, before you go into the resource, from a learning perspective, is something that makes you think about it before you go into the resource, and thinking about it before you go into the resource has been proven very specifically to improve the learning outcome.

Leveraging Large Language Models for Questions

Creating a question, something that a large language model is great at. So here, which aspect... This is actually my own talk. That was not actually planned.

I literally just took a random resource here. The time was up as well, so I didn't actually answer the question that it asked. But you can see here it gives me feedback on the validity of the three different answers that came back.

Personalized Feedback and Learning

Why? Actually, the way that we do it is that one answer is we try and get the large language model to give one answer that's correct. One that has some elements of correctness and one that's just wrong. And then we explain, wait, what was wrong with the answer? What was correct, kind of somewhat correct, but not totally on point?

And why was the answer, the correct answer, actually the correct answer? So answer's correct because using GPT for sheets extension for deep analysis and contact prioritization.

So here, which aspects of Josh's method reflect the best practice of utilizing AI tool data manipulation in Google Sheets? This is my talk. So this is like the starting point.

Actually, this is not me. This is another Josh. I just got confused.

It's just asking a question. before someone goes into a piece of content, very simple use of using a large language model, very hard to go wrong. I mean, I say that we have seen the models go wrong. Sometimes they ask very weird questions, but it's much more the exception rather than the rule, but by far I'm talking 99 to one.

Diving into Free Text Questions

So then a second bit, uh, I wonder now I have to look here. Okay. Same mechanism, I'm literally just jumping to another program so I can demo another question that comes through. Here, create a free text question. So I'm going to select that.

And here, reflect on a recent situation where you needed to evaluate the information provided to you and describe how you would approach the situation differently if you were to rely on ChatGPT for assistance. In your reflection, mention any steps you would take to verify the information ChatGPT gives you and how understanding its limitations could influence your evaluation of its responses.

Pretty good question, I would say. So this is a part of the module. As we go through AI Competency, we help people think through bias.

This is probably going to be a pretty long one to answer, so I'm going to do a terrible job at answering this question. When I ask a question that exposes itself to bias, I just take the answer as is. And hit submit. Unless live demo.

User-Specific Content Generation

I have a question while it's loading. So this was the question, is it generated for each user or is it generated but then you almost have At this particular point, it's at the program level. So it's not for that particular user. That is indeed what we are looking for, is that it ends up being 100% personalized. And actually, there are some aspects that I'll show in a second that are exactly that.

OK, so one thing I just realized, we should have a loading screen at that point, because I thought that the system was just way too slow. And I thought it was bugging, but it wasn't.

So thanks for the question. Here you can see the actual answer comes through. My answer was terrible.

Using large language models to provide feedback on an interaction, just like with coaching, feedback is a great way of using large language model and it is a core component of any learning experience that each of us are building. Actually, when Mika started today, what is your first, what do you do when someone comes to you with a question? I was about to answer ChatGPT. Because that's my default at this point is, does ChatGPT have an opinion on the thing?

just because it exposes me to another side of the question. So in this case, your response does not seem to fully address the question asked. Question was asked for you to reflect. I didn't reflect on this recent situation. None of the key criteria for this question were actually met, and so I get a really bad score.

Another great way for large language models to be able to provide feedback that doesn't quite expose you to the same risks of bias. Well, bias is kind of the one that's a little bit hard to escape that still sometimes comes through, but hallucinations and inaccuracies are less of a problem when you're looking at this.

Personalization in Learning Resources

Now, the next bit is the first bit of personalization that we're looking at. So let me go through here. I have to download and look at content.

So the other bit of personalization that we do is we have every resource we go through, pick the wrong one, there's something wrong there. Okay, every resource we go through and it's uploaded to the platform based on the environment in which the user lives.

So in this case, I'm in the Mindstone space, which means it's part of the Mindstone company. It knows what Mindstone is doing.

And what we do is we get the AI to automatically start annotating the resource in a way that draws the connection between our company priorities and the content that is being presented. So here we're looking at prompt structures, improve output quality, testing prompts. This was a resource that was recently shared within the company.

So very interesting one, they did a study on prompt structures. Everyone uses a bunch of prompt structures. Actually, it's been somewhat proven.

I mean, it's still somewhat inconclusive, but definitely good evidence towards the idea that prompt structure doesn't matter that much, or at least the technical structure doesn't. However, I would still suggest you use it because it forces you to make sure you don't forget something. So no, you don't need to have a thing which says persona this. context, this, what is important to forget. None of that is actually required, but if you don't have it, you maybe forget to put a persona in place. You maybe forget to give the right context. You maybe forget to tell it what is important and what is not.

So I would still argue structure is interesting when you're doing it, but there is a study out there now which is that ultimately, once you know exactly the prompt you want, you can just remove it. And actually, I saw because prompt didn't have any of those that I would use, and I label them quite a lot. But you can see here, There's a question that's generated, and this is GPT.

Reflecting on the prompt crafting process for LLMs at, well, in our case, our team is called the Mindstone team, what are some strategies you could implement to reduce time spent while still achieving high-quality output, and how could these strategies impact your team's productivity? This is a question that's very specific to our context because at the company level, we have a target to try and figure out how do we each use AI in order to be more productive. We have set a context. The question is relevant to our context as the individual goes through these different resources. This is a semi-personalized question because it's personalized to the context of the company in this case. In our case, we're 12 people, so it's a small group of people, but you can see how this can be at a higher level as well.

Interactive Learning with the Sandbox

So one thing that's part of the program is what we call the sandbox. And the sandbox is a way for anyone that uses the program to test their prompts. And as they're testing their prompts, they don't just get the result, but they actually get an evaluation of how they could have improved their prompt for it to be more effective in their particular job.

Here, you'll see the feedback. Actually, I forgot to do another thing. I'll come back to that in a second.

How can I use AI to be more productive? Here, I'm getting the answer just like you would with ChantGPT. There's nothing magical around that, with the exception that as we're actually sending this through, the response we're getting is personalized because your profile data is getting sent along with the actual prompt that you're providing. But on the right-hand side, you get an evaluation of the prompt itself.

And this has only recently been released. I must say that I'm actually not personally happy with the feedback it's giving because it's not critical enough. This was a terrible prompt, and somehow I'm getting a score of 8 out of 10. That shouldn't be the case.

But luckily enough, feedback is only additive in this case. So even though I get an 8, what is important is the feedback you go through. And then at the end, you can see the specificity here is actually terrible. Positioning particular areas of productivity, time management, task automation, and so on, could have been a better thing. And then here, your prompt is clear, relevant, and feasible with enhancing specificity by mentioning particular areas where AI could boost productivity is something that you can prove.

Less than ideal, again, live demo and only live for the last week and a half, so this will improve, but the idea of being able to provide live feedback, independent of, the quality of that feedback is gonna improve based on how we iterate on the prompt, but it's only additive, it's only adding to the experience in this particular case. There's another thing, and I forgot to do that earlier, is that as you are interacting with the resource,

Context-Sensitive AI Interaction

you have the ability to ask the AI in context a question that you have. So in this case, you'd think it's similar to putting it in JadGBTE, and to a degree it would be, but the difference is that it has the context of the article you're on, the context of the profile that you have, and the context of the company that you're in. So if I were to ask, how would this apply to me?

Okay, for your context, this suggests that you can streamline interactions focusing on clear 16 prompts that encapsulate your criteria with over-structuring prompts and excessive detail. I wonder... with your team enhancing productivity and efficiency.

Additionally, it's valuable to... Okay, so the answer that comes back here isn't directly evident that I can actually show that this is not... I can only tell you that it's actually taking into account, but it's not referencing the fact I'm part of Mindstone. It's not referencing my particular profile. But what is interesting is that you're getting an in-situ personalized answer going through.

Conclusion

And then the last thing, and this is, I think, another area which is totally un...

The Underrated Potential of Assessments

under used so I have to go into another account now because at the end of the program you go through an assessment Assessments is another very interesting bit where you're asking questions and you can evaluate the answers based on the questions that were generated. And so with very little effort, and I'll show you the prompt in a second, you can get a fully personalized assessment from here.

Okay, let's get started.

So here the context is the account, this particular, so this account was configured to be, I think, serial entrepreneur CEO at MindStone. So it has that context. It has finished the AI competition program, so you're at the end of a program. It takes the content you went through and then generates an assessment on the fly.

And it does so, and you'll see in the instructions in a second, by trying to take a combination of multiple choice questions and free answer text questions. So what are some potential applications of large language models such as ChatGPT in enhancing business operations, particularly within your organization? Provide at least two examples.

I am not going to run through this full assessment because it's going to take 15 minutes on its own because the answers are fairly extensive. So I can say here, customer support query answering and so business operations coming up with personalized use cases for every for AI in everyone's respective job. Not great as an answer.

Whoa, 10 out of 10, that's not great. Okay, well, I would have said that was not a great answer, to be honest. But, okay, we'll see. Clear understanding of real-world relevance and practical applications. So what is it here? This demonstrates practical application tailoring the various job functions within the organization. I guess to a degree, knowing that it knows that mindset is specifically about that maybe, but I still think it is not critical enough.

Second step here, which steps are crucial to crafting effective prompts to maximizing quality of response? Um, chain of thought, um, personas and context. Interesting. I would have thought that better than the first one. Okay, mentioning personas highlights important elements. What could be improved? Specify that a clear and straightforward prompt is essential. Yep, I did not mention that. The more straightforward the prompt, the better the answer you get. Explain the need for detailed background information, specificity, and requests get effective responses. I mentioned context, but I guess I didn't expand, and it considers that a bad answer. I guess overall I'm happy with that.

So I'm going to stop there, because the whole purpose just of today was to try and expand how you think about various application areas of large language models that have less of a downside that we might all be a little bit afraid of.

And there are many of the types of cases that I just showcased, asking a question, whether that question is just generating multiple choice, whether it's free text, whether it's feedback, whether it's assessment or trying to understand if someone understood a particular concept, those are all areas that have less of a downside that we could be looking at direct

direct implementation of today and I think they are under explored because everyone's trying to understand how to do the first thing which is helping someone like do the full learning journey and sometimes that gets you into into blockers within big companies that then prevent any progress and I think there are many different ways that you can do these things so hopefully that was useful hopefully it sparked something and please let me know how you think I can improve this type of demo because I am probably going to be doing it another three or four times so

Thank you very much. Any questions, let me know as well.