Bridging Business and Data: Building a Natural Language SQL Assistant with ChatGPT and Metabase

Introduction

I'm Bora, and the engineering manager for Waybiller. Waybiller is a transport management system software company, and we are mostly working in Estonia and expanding to the world.

Goal: Natural-language access to internal data

What I will be presenting today is about how we use some AI tools internally for our company purposes. And just to give you some ideas how you can use AI for productivity.

So what we needed to do was to get our internal data, so the client's data, but to provide it to the business team in a way that they can ask questions with natural language and get some answers. Um, well one restriction we would have in this case is they don't have access to our live database.

So, uh, what kind of scenario would you come up with if you had this case? Any ideas?

localized LLM, yeah, could be used. But then you would need kind of like two instances and one of them would be using the LLM.

So, unless no other ideas, I will present what we came up with.

Solution Overview

So,

Choosing Metabase for data visualization

We first searched for a product that we can use for data visualization, and we landed on Metabase for our use case.

With Metabase, you are able to ask SQL

So like, uh, well of course there are, uh, cloud versions that you can ask directly, uh, but haven't tested, uh, how, uh, useful they are to be honest.

Bridging SQL skills with a custom GPT assistant

So you are able to ask questions with sequel, but the business team doesn't speak sequel. So I will show you which direction.

an example of a GPT that we created.

So it starts with your SQL assistant and you are providing answers or how to write this SQL when you are asked a question.

So you can see the instruction is not really that long, but what it does is use this knowledge base.

Building the knowledge base: schema and relationships

So, uh, in this knowledge base, I can show you, uh, what we did over here. Uh, so just a small SQL command to give us, uh, all the tables with all the columns.

And another one that I can actually not show from there because that user is restricted. If I can find the screen.

Why? Why not? Ah, here.

and another SQL which gives us the foreign keys. So basically what are the relations between different tables.

Defining table metadata and relations

So once you have this information then you can form something like this. This is just an example.

So it basically says there is this table that has these fields with these data types and then There is another section that shows table relations. So how are the columns related to each other?

So that is the file you are seeing in this GPT over here.

Demo: Generating SQL and visualizing in Metabase

1So now basically what you can do is you can ask anything and it will give you a SQL command that you can run in Metabase. And we can also showcase that.

So let's make it a chart maybe instead of a table. So I can say like I want to create a chart of how many wheelbills were created from transport orders in the last, let's say, five months.

So now we have a SQL. And I can copy this code, directly go to Metabase, and maybe use our staging environment for this case.

And... Okay, so... Well in staging there is not much data so we can see we got some numbers so I can go to like visualization and now it gives me from April to September.

and the rest of it is like how Metabase works. So maybe not much to show there, but you can add this to your dashboard, or you can ask any question ad hoc, and it will find the data.

Live database reporting with read-only access

So Metabase has access to live database, and it can generate such reports with just a question.

Any questions so far?

Yeah. Yeah.

I did. I did.

Q&A: Architecture, tooling, and trade-offs

Yes, so the Metabase instance is only using the read, so it's not affecting the database performance. But for your question, like, why can't we have this inside Metabase, is, well, there was one tool that claimed to do that, but it wasn't usable.

So you kind of have to get data from your Metabase instance and also have that talk to the LLM. So it kind of complicates things.

So it could be a nice hobby project. So it could be an interesting idea.

But for now, since we are kind of used to using ChatGPT, it's not much of a work like copy-pasting to Metabase in this case.

You had a question? Mm-hmm.

And also, it would...

Maintaining the knowledge base

So it's just a matter of, remember the two comments I ran to get the database tables and its relationships? 1So you just run that and update the knowledge base. And pretty much doing that once in one or two months is good enough for our purposes.

Yeah?

Data security considerations

What about data security?

You uploaded your entire database architecture quickly. It's just a schema.

So, you know, like, ChatGPT itself is already able to generate such a schema for like if you tell it to, or you can pretty much do that with most coding assistants.

So at least I don't treat it as a secret. And also, for company purposes, as far as I know, ChatGPT does not train on your data, or that's what they claim. So it should be good.

Yeah.

Handling errors and hallucinations

Well, for our purposes, it's not mission critical because if, well, also, like, you know, when I was testing it out for different queries, it sometimes suggested, like, the wrong field. then if you tell, like, no, this is not what I was looking for, and, like, explain your use case a bit better, then it can find the right table, and if they can't, you know, business team can reach out to development and ask for help. That's fine.

But what I meant with the initial case was you can also, when you run the query, it can say, no, this is not possible. It can give errors, so that means there is something wrong with the SQL itself. Or the LLM is hallucinating a table or a column that doesn't exist.

Can happen so when that happens you can just try to interrogate a bit more Yeah With some local models well my local model testing experience is llama Okay, no, I haven't tried with something better I With what you said?

Okay. Okay. Exactly. Yeah.

No. They're not, so they can interpret from the results they get.

So, uh, and,

Why not local LLMs?

For example, the main database is on a managed instance, so it's not running on an EC2.

So there is not much advantage for us to... Because once you use a local LLM, then you also have to maintain it. You have to care about that instance or update it from time to time, the LLM itself.

So ChatGPT is kind of... Easier choice here. I know.

Model choices and updates

Actually, this model was using 4.0 for so long, and just today, after someone asked why you're not using, was it you? Why you're not using Sonnet 4.5, so I just changed it to 5, but both should work.

I think there was that.

Potential future system: in-house agent with tools

You consider writing your own MCD server that you run it in your own infrastructure, and you just write five or seven tools, like one gets your database schema in real time, the other one just translates business requirements in SQL, the other one just runs the SQL, maybe a few more to just create PDF, graph, chart, or something. and give it to your business team, so then you just get to write plain English, and they get really nice PDF reports right in the job on the database.

And the development team, like if they check something in the database, they don't have to add anything because the tools will go to the database and get the actual database key level. Totally can be done. Well, of course you would have to maybe optimize it for production users.

So you mentioned reports and updating the schema. So those are not taking much time for us. So it's a matter of probably I would have to invest more time into working on that setup

Okay.

Scalability and context window limits

It is big, but not that big. So we are talking about 200 tables in this case, and probably like, I don't know, 3,000 columns or so. And yeah, it hasn't hit a context barrier yet.

But as you expand, might be, but also as you expand, the LLMs will have bigger context sizes, so maybe nothing to worry about for now.

How are we on time? Yeah, I mean, I could present one more thing, but yeah, if you have time.

Use Case 2: Automating release notes and support articles

So this was one of the use cases.

Another idea that I could show you is, so another topic that we were kind of using a lot of time for was creating release notes. So pretty much every week we have a release. And there are, I don't know, you will see like six, seven tasks that have some impact for users. So we have to update our release notes.

And, well, we have that data in Jira, but it's not really enough. Because we also have articles in our customer support platform. So we also want to update that.

And for that, for like think of a Jira tasks description, it doesn't tell you how to use it. So any ideas how you would approach this? How would you create... Release notes and articles.

Okay, so this is what we came up with.

Multi-source workflow: Jira, GitLab, and term mapping

So of course this has slightly more instructions, but basically it's about what I just mentioned. So pull data from Jira for the latest release. fetch the tasks and their descriptions, but then go to, well, we're using GitLab, so go to GitLab and fetch the merge request notes for those tasks so that you have the context of what's changed and how the user can interact with that task.

And then it can provide us these both. So the interesting part here, there are multiple things.

So we have Estonian release notes, and our tasks and all description are in English. So we need some kind of mapping for special Estonian words instead of their English counterpart. we're using JSON bin here for a small database to keep that data.

Then you're using GitLab and also don't let this kind of schema, how to say, it's smaller than it looks because you can also ask ChatGPT to create the schema. So you can say, I want to get GitLab merge requests and search for them, and it can give you the schema. So we have the schema with authentication here, and we have Atlassian, as expected, for Jira.

So then...

Demo: Generating multilingual release notes and articles

If I go to the GPT itself, so like give me release notes together with suggested articles for the last release. Then you wait a bit. Is it my laptop?

So you can see, according to the instructions, it gave us what's the overall summary of this release, improvements, bug fixes. Ah, it's stuck there. Ah, it's my computer. Okay.

Uh, then the same in Estonian. Then we have suggested articles.

Uh, so you can see like this is the task ID in JIRA. This is the merge request ID in GitLab. Uh, then it tells you like you can use this flow and uh, modify this and then it changes that.

Uh, same for Estonian, Russian, another article for another task, and there you have it. So this is currently what we're using to generate automated release notes.

Well, of course, we proofread it, but it's usually nice enough.

Any questions?

Under the hood: schemas and connectors

Well, there's the development part.

The schema you said? Yeah.

Uh, so if I open, ah, I was going to show you like, uh, these are the tasks, but yeah, I guess you believe me.

So, um, if we go to, and, um, I need to connect to, you want Jira or GitLab? Jira.

I want to connect to Jira with GPT to fetch releases and find the last one. Then I need to fetch tasks in that release and there description.

Give me the schema I can use in my GPT.

Of course.

Conclusion

Any questions? All right. I guess that's about it.

Thank you for listening.

Finished reading?