Gemini vs OpenAI Deep Research

Introduction

Okay, so as I often tend to do, I'm going to try and do full-on live demo. Which means that things will go wrong. Because they always do.

But I think you can see this here. Quickly going to go and get my water as well.

Engaging Research

So what I want to do, I want this to be engaging in different ways. So I'm not going to go and just do a thing here on my own. We're going to go and do a piece of research.

So as I said, we're going to use OpenAI's deep research and Gemini's deep research. How many of you have used Gemini deep research before? Not many.

How many of you have used OpenAI's deep research? So those are the ChatGPT Pro users in the room. Great.

How many of you have used Notebook LM? Many more. I might try and throw that in. Okay.

Research Questions

Now, we're going to try and build a report maybe on a piece of research on how AI affects the future of work. So AI... and the future of work.

Now, what are some questions that you would want answers to? Imagine you had access to like consultant from McKinsey and you could ask for specific questions to be answered in this report about AI in the future of work. What questions would you want to see answered or what aspects would you want to see in that report?

Augmentation versus displacement. All right. Anything else? Entry model versus management, what's that? OK. Seniority, yeah. Anything else?

Yes. Okay. Maybe two, three more. Trust. Yeah. Yes, hallucinations.

For a second, I thought that was going to be my WhatsApp. I might actually just kill that. There we go. One more, last one. How do they expect the nature of work going forward to be different than now?

Structuring Thoughts

nature of work that's a big one okay now um you might start a project like this like you normally would i mean literally this is just a piece of research on ai and the future of work but any type of project kind of starts like this so you you try and get your thoughts down in a structured way even if it's when you're building something or when you're like whether it's a strategy or it's a plan or it's an operations element or it's actually for engineers when you're building your own, some product one way or another. 1You have to get your thoughts down in one way or another and here we just did that using a mind map.

You can do it in various ways. The way that I like to do it in order to then structure my thoughts is I just go into ChatGPT and I ask it, okay, Imagine you're an experienced executive assistant, expert at taking a random mind map and turning it into a great briefing note. Attached is a mind map on a piece of research. Please turn it into a briefing note for me.

Actually, there's a few things that I will add to this. I forgot. It's not allowing me to edit it at the moment, so I'm just going to relaunch that. Oh, it's because I was in O1 Pro as well, so I'm just going to switch that back. Let me add that again. There you go. I think that was the image. Let me just double check as it's uploading. Nope, that was the wrong image. The one thing that I missed here is walk me through your thinking before giving me the briefing note.

Anyone know why I would use that? Walk me through your thinking before giving me the briefing note? Chain of thought reasoning indeed. How many of you are familiar with chain of thought reasoning? Good, that's a little bit more than a third of the room. So basically when you ask these models to outline their reasoning before giving you an answer, you get better answers and reduced hallucinations, and it just helps with everything going through.

Using AI for Research

There's a whole reason behind it, I won't get into that today, but what we had here is we got walkthrough, so you actually get the reasoning as to how you construct that briefing note, and then here on the right-hand side, I then have a full briefing note. This is quite heavy though, so I wonder, we now have two things.

And again, opening IDP research, I am discovering just with you as well, because one, in the UK, I need to use a VPN to do this and it didn't quite work that well. So really when I landed here was the first time I properly could explore it. So I'll have fun at the same time.

But what I do know is on the Gemini deep research side, it works better when you give it a compact paragraph. So what I'm going to do here is now imagine you're an experienced researcher briefing a junior researcher on this piece of research. Give me one simple paragraph that stands alone that I could give to a junior researcher to do all of this research.

And again, I could have used chain of thought reasoning here, but in the live demo I'll go a little bit faster. Okay, so I now have a paragraph.

Research the impact of AI on the future of work, focusing on how AI is reshaping job roles, particularly the distinction between augmentation, displacement, examine how AI affects entry-level versus management jobs, challenge of trust and ethical AI adoption, and so on. So everything we just talked about, the various aspects that were outlined.

We have a much more detailed briefing note here on the right-hand side should we want to use that, and I'm actually going to try and use that from an open AI perspective so that we can compare as this goes through. So I'm going to now launch that in Gemini 1.5 Pro with Deep Research. So just copy pasting. Takes a few seconds just to get a plan.

Gemini Deep Research

Okay, here we go. We now have an eight-step plan.

One, find research papers. Two, find case studies. Three, find cases of how AI is leading to job displacement. The second one here was find how AI is used to augment human work. Five, find data statistics on the projected impact of AI on employment across different sectors and job levels.

So it's basically... taken what we have as a research brief and broken it down into these different steps. And that's important because this is really one of the first bits that's an agent that's actually useful because it's going to go through those eight steps in order to go and answer the question that we asked.

I'm going to hit start research. And here in the background, it's going to actually start doing this research for us. 1The way that this works with Gemini deep research is it goes out and crawls a whole bunch of different websites right now and then uses that in order to start answering the question that we have. So we'll see what comes out in a second.

But because that does take like five minutes, I don't want to just wait there. I'm going to go and do the exact same on the OpenAI side. And here I'm interested because I can now go in the same flow

OpenAI Deep Research

Can you do this research for me? So what you saw here, by the way, I did that very quickly, but I just selected the deep research button here for open AI. And let's see if it's doing any of that. It's not actually doing the deep research.

Go ahead. Nope. So let me... Ah, it switched to 4.0 mini, interesting. Okay, so it's taking a little bit of time.

We'll see here, one thing that you will notice very quickly is the way that both of these tools do their deep research is very different. So Gemini is now at 105 websites that it has already crawled. The way that it does it is it goes and crawls the web and gets a ton of information and then goes through a distillation process to try and get you an answer.

Whilst what you'll see happening here on the right hand side with OpenAI is it's actually going to go step by step and more Go after it like a human would. So it's going to find a few websites, scroll those, figure out what they mean, refine that, and figure out how it does the next search, if it works at all, that is.

Let me just refresh that again to see if that comes through. Huh. Well, I think everyone in the world might be using deep research.

I'm going to try one more time, but I'm going to actually just take the paragraph. Like this. So outside of the normal flow. Ah, okay, suddenly it's actually responding much better. Interesting.

Comparison of Approaches

So this is the first thing when I tried it earlier as well. Difference again between Gemini deep research and OpenAI deep research. Gemini actually asks you, sorry, OpenAI will ask you questions.

So here, to ensure a thorough analysis, could you specify any particular industries or job sectors you'd like the research to focus on? Additionally, do you have a preferred timeframe for the analysis? Let's look at the next five years and compare it to 10 years from now. Knowledge intensive industries.

There we go. Okay. So it's just summarizing what it's going to research now. Hopefully it'll actually start launching it.

In the meantime here you can see that Gemini has stopped the research of the website, so you can see here it's now analyzing the results. And what I'm going to do is I'm actually going to take a few of these, and whilst we're waiting, I'm going to load them in to notebooklm, as I saw that not that many people have used that either. So notebooklm is just notebooklm.google.com.

You can add a whole bunch of sources, so here I'm just going to add websites to it. There we go. I know that I'm doing a whole bunch of things and then you're just waiting on the results, but all the results are about to get spun out, so you'll see some great interesting stuff in a few seconds.

Okay. Hit generate there. Now I'm going to go back to Gemini. Gemini has now finished with the research report.

Two things that happened. One is I have the report here. What is interesting with Gemini Deep Research is that it integrates with Google Docs and so any table that it has you can export to Sheets. The report itself you can export to Docs.

If I click here, wait for a second, I have what is now a nine-page report, fully referenced. So everything that it crawled that was interesting that it thought to include in the report is there. It goes into AI and future of work, augmentation, displacement, and the evolving role of humans.

So what do we have on augmentation versus displacement? AI is likely to improve the job quality by automating routine tasks. However, this technological shift also raises concerns about job displacement of various sectors.

What do we have here? For example, AI chatbots are already taking over some roles in customer service, handling routine queries and troubleshooting, clearly. All of this, again, being referenced. Now, what is interesting is we have this six, well, six, seven page, it was really seven pages with two pages of references.

a report that is entirely based on what we asked it. Now, like a lot of what is happening with these AIs is the report is as good as the query that you give it. So if you wanted a report on a very specific area or you wanted it to be looking at very specific things, you could have asked that at the start. We were fairly generic.

We looked at a report, future of work, and the various aspects that we wanted to approach. I've had tries with this where it goes up to 800 or so pages crawled and it spits out reports 30, 40, 50 pages long that really go in depth and I use it now every week multiple times depending on which area I need a specific data set on, suddenly I can just go and use Gemini deep research for it. Now, on the right-hand side here, now flipping back over to OpenAI, this is where you can see how different it is.

OpenAI is now at 13 sources of how it started to crawl to answer almost the exact same question. I copy-pasted the same paragraph, but as you saw, it asked me a few questions before it actually launched the research. It has a slightly different focus.

But what it did here is first it decomposes the question. It then searches based on what it thinks is the most important starting point. It comes across a website here, aeaweb.org.

It does that twice. It then goes over to... Oliver Wyman Forum. I'm not sure why that is where it ended up.

But you can see how here it's processing in a very different way. This is in a much more, I'd say, agentic way where it's really looking at step by step and crawling the internet like a human would do in order to start building the report. And we'll switch back to this in a few seconds or a few minutes because it isn't done yet.

It's still going through. It'll finish when it decides, when it thinks it's there. And notebookln is not yet done either.

Questions and Answers

So actually, whilst we wait for both of these to finish, any questions right now already so that we can avoid the awkwardness of me just standing here? Yes.

Depends on the type of report. Most of it is skimming, but still 40 pages of interesting stuff that like it ends up being it's useful enough to skim and then I Zone into the areas that I really want to double-check And then also the sources that it comes up with like when it's really important I then go back to the sources and use it so I don't use it as Like a full-blown report that I would maybe in other cases Yep

If you move in the prompt that show me your thinking process, thought process, is it actually using your basic model in the back end or using the next scope and prediction So interesting question when you ask it to outline its thinking process, is this actually thinking or not? That is a very loaded question. Different people will give you a very different answer to exactly that question.

Some people would argue that what people do is next token prediction to begin with. But no, it's not a different model. It is next token prediction, but that doesn't mean it's not reasoning.

So that, yeah. My opinion is it is reasoning, and I have a whole bunch of reasons for that and can go into it, but that would be a very long conversation. I will debate you on that one.

Any more questions? You had a question there.

Can you choose which model you're using to do research, or does it have So you can choose it, but it's been playing up for me. So that's why I end up using 4.0 here because it's the one that's most prone or least prone to start erroring out.

Once I started using 0.1 Pro, it just took very long and then it never really got to the end of it. But that might just be right now because they've got so much load and they're trying to figure out what it is. But theoretically speaking, you can use all of them.

Yes. Good question.

It has a search engine underneath. So basically what it does is it figures out what the search query should be, just like if I would have asked ChatGPT for a search query and then taken that and gone to Google and taken the first result or something like that. Don't know what it is using. I wouldn't be surprised if it's using Bing because Microsoft and OpenAI are very, very close.

Yeah, sorry. . I'm not sure I understand the question, sorry.

So the question is basically when he's writing a specific report for a particular disease, and you put that keyword, then the articles are tons of articles. In that case, he's not able to define it. So how that can be used?

I'm still not sure I understand the question. I'm sorry.

You're asking about how we can, or how the AI is able to find the right resources? How we can funnel down in terms of what is our need that's clinical or non-clinical, So that would be in the prompt. It's in the starting prompt, is being very clear about what are you looking for and what are the resources you want included. And then if the prompt is specific enough, it would take that into account.

Conclusion

So I'm going to look at now the result that we have because now finally this is one of those things where I mean in this case it actually took eight minutes to go and do the research. Now it's crazy that eight minutes seems such an amazingly long time right because I mean doing this yourself would have taken hours probably to go and try and go through and distill all of this but still on a stage live demo eight minutes is an eternity.

So here we have another report. So AI's impact on knowledge work, five-year outlook versus 10-year projection. So augmentation amid disruption.

This seems to be a little bit more narrative style. So in the next five years, AI is expected to be widely integrated in knowledge work.

You have a little bit more numbers. So for example, in-house, a generative AI at one firm boosts the employee productivity by 20%. And here you have the source as well.

So you can go straight, just like with Gemini, you have the same thing. So in this case, I would say both end up in a fairly similar place, but at least in a highly skimming, it just seems a little bit more narrative going through.

Gemini did, however, search 105 websites, and here you had 18. And so it'll be very interesting to look more qualitatively between these two where you end up in the better place, like the more focused bit that OpenAI does versus Gemini going through.

now ah there we go okay this is finished in the meantime so in the meantime i did want to finish on this one just because it shows you a different way to explore that same data normally in my flow what i would do is i would upload the report and the most important sources to notebook lm and then actually be able to explore this data in a different way because what you're going to hear here

Additional Insights

This is pretty cool. Okay, so ready to dive in. Looks like we're tackling AI trustworthiness today. Yeah, should be interesting. You bet.

You know, you sent over a ton of stuff. Academic papers, reports from NISC, even... That's the one. Plus a whole guide on fairness metrics. That's right.

It's a pretty hot topic, all this AI trustworthiness. A lot of folks are... Trying to figure out how to build AI they can actually rely on, right? Yeah, totally.

And it's not just the technical side of things either. Oh, I know. That's what caught my eye. All the social and psychological angles. Definitely.

It's all connected. So maybe we should start with like a baseline. What does trustworthy AI actually mean? One of the things you said, the NIST AI risk management framework, I think it was.

It's interesting. Whoa, did someone just break the fourth wall? Let's hear it.

Hey there, can we actually focus a little bit more on AI and the future of work very specifically? Absolutely. I hear you.

AI's impact on the future of work is definitely a huge part of this whole trustworthiness conversation. It really is. And it connects to a lot of what we were going to talk about, like bias, job displacement, and how we make sure AI is used ethically. So we can definitely steer this a little.

Let's still talk about what makes AI trustworthy in general, but through the lens of the workplace. Sounds like a plan. So building on the...

So I'm going to stop there. But this is now another tool. So notebook.lm, if you haven't used it yet, if you didn't understand what just happened here, we took those sources. You saw me kind of just take those four websites, and I just put them into notebook.lm.

And then it creates a 20 minute podcast out of it, which is just a summarization, kind of deep dive into the content that we asked for. You can then join the conversation and you can steer that conversation in the direction that you want it to. So you can have an audio conversation with whatever data set that you're uploading. And that becomes an extremely interesting way to... try and structure thoughts in your own head, you can throw thousands of pages at Nopakalem and it comes up with really interesting conversations.

So hopefully that was useful.

I'm now going to hand it over here and we're going to look at the future of AI, if it is AGI or narrow, or in this case actually the case for why it is narrow.

Thank you very much. Hopefully it was useful.

Finished reading?