Ramping up UI Testing for the AI Age

Introduction

I'm Pete, I'm founder of testnexus.ai and this is a product that I'm building to perform UI testing in an age where web applications, a lot of people are now writing them with AI as they should be if they're doing it well and structured. But we're looking at how to ramp up UI testing to match that sort of capability and what we can get out of AI in this sense as well from the other angle.

Why UI Regression Testing Matters

Okay, so I don't know if you guys are web developers, I am, but what you need to do when you release code, before you release code to production, ideally, is have a set of UI regression tests. So this is just tests that simulate what a user would, how a user would interact with the pages in your web application. And these can be quite complex sometimes.

You want to test as much as you can to make sure things don't break in real production systems. You don't feed bugs to your users. And depending on your system, that can be costly.

So we've got to make UI regression tests. And this is the overall aim.

Common Tooling and Approaches

Usually it's a collection of people writing a bunch of playwright code that's all run, or Cypress. There's various different ways to go about it. We're fairly agnostic on what we produce.

Some of these interactions make these complex end-to-end user flows, and then... I'm not quite used to a small screen.

A Simple Banking App Test Flow

So just as an example, a very basic example, if you have got a banking application and you want to test that, the first thing in your test is probably going to be you have the playwright code perform an action, user signs in, and you're going to expect that the next thing you see is the page loading. And so you'll have these sorts of statements throughout.

So you go to the next one, and you're going to be clicking the Count drop-down button. You're going to expect, again, that your menu is going to open from that. and so on and so forth, just to get through your application, through the steps you need to do to test your feature.

So you can then have the action click through to the current account, and you're going to expect that comes through, and you're going to check the balance, and maybe you go off and check that value against the database.

It's a very simple example. You'd have much more complex flows in reality.

And so in this hypothetical situation, all seems to have worked. We've got good code, so it would have gone out to production.

Handling Failures and Maximizing Coverage

But what if you do get a failure in your test suite? Well this is still valid useful information and you want to know, you'd rather know than not have run these tests at all.

1So having maximal test coverage, really important for web applications and the tools are going to be there shortly, early on the way to really make this possible to scale out your UI test automation en masse.

Challenges Writing Solid UI Tests Today

So what am I talking about? Well, how do we write solid UI tests now.

Well, a lot of people spend a lot of time hacking away, writing playwright code. I've been there, I've written a lot myself, Cypress code, however it is. And that's fairly labor-intensive.

It's going to be, it depends on who's writing it to exactly how the test is structured. Sure, you've got pull requests, but maybe some inconsistency between how these things are written might eventually surface themselves in books that you've missed. And because essentially you can only do as many files as you can code up, you're limited in how many you can produce.

Trying to Automate Test Authoring with AI

So I thought to myself, well great, Cloud Code exists, I'm going to just start getting it to do a UI test for me.

And it came with a few problems.

Where Naive Approaches Fall Short

Essentially, yes, if you just get an out-of-the-box Cloud Code and point it at your code base and go, okay, I'd like to get some UI tests to get a better product overall, you're going to run into a few issues, at least I did. And that's that it's going to go off track in anything but simple cases.

It might handle the case I talked about then, And then as it gets more complicated, it's going to give you some strange test results. So it's not the kind of thing on its own you want to do.

The Importance of Precise Context and Tools

1So it really requires this explicit context. I think Mark did a great job of talking about context and giving these things exactly enough context to do what they need to do so they don't get confused with tools they don't need and context they don't need. And also the problem with giving it too many tools and context.

Context Windows for Complex Flows

Initially, if you pilot an app, yes, you want a context window. And particularly when you're talking about complex UI flows, you know, let's say 10 steps. That's probably on the longer end.

Optimizing Reasoning vs. UI Description

This AI is going to be applying a lot of thought to reasoning, and you want its minimum possible context window to be taken up on what the UI is and maximum on how do we navigate through these UI in accordance with the intention of the test and the designs which the test represent.

Introducing TestNexus

So I've been working on a system, this is what I call testnext system.

We'll be ready month or two time.

Graph-RAG Powered UI Testing

Essentially, graph rag AI-powered UI testing.

Building a UI Knowledge Graph from Code

1And so what we do, first of all, we crunch the code base to build a graph database that encapsulates all of the possible UI options within your application. So we do that in a very structured way. And the reason that we do that is so that when it comes to the test writing AI at some point down the line, that's going to need, yes, the minimum possible context that is useful to it to perform its tasks, to write that test.

Targeted Context for Each Test

So it only needs to know about the areas of the UI that are involved in that feature. So if it can preempt what that is, get all the tools that might be relevant for accessing databases and such for that preloaded, then you're going to have an optimal time writing those tests.

LLM Flexibility and Integration Options

And these things can be configured to access any different LLM feature, whether it's through cloud code or off to APIs and different methods. We'll talk briefly about that.

I think I'm overrunning a little bit on time. We'll go into it this way.

Modeling the UI as a Graph

But this is what I'm describing when I say that the UI can be represented by a graph database. We have nodes, so in this case, the root of the application, so the home page, let's say, and the possible things you can do in here.

So if this is a simple page and it's just got an input where you can subscribe with an email address, let's say, and a button to click Submit, then you have two very simple options.

But as these things expand you can map everything in your UI until you can reduce down your UI concept and UX flows into purely graph concept.

And you can have these things fetch other POMs which are representations of the page as well. So they can eventually build up the whole network, the entire application can be built in this way.

Automating Graph Construction with Agents

But building the graph, I tried doing this bit manually. It gets pretty tiring, pretty fast. So I thought, why not just AI orchestrate that as well?

And that's actually become an even bigger component of the system. So doing it manually, wouldn't recommend.

orchestrating this process and getting to a point where you can just get a code base, point your way at it, do a few pre-processing essentially, transform it into a graph database and spit out representations on the other end and that's what you want.

Demo: Orchestrating Document Transformation

So yeah, the bit I'm gonna show today is a fairly simple example, but it's basically just showing the orchestration elements of this, how these agents go through the different parts of the code base and perform tasks where they've been equipped with specific context and specific tools so they can do exactly what they need to do for that task. Now the task we're gonna do is extremely simple for an AI generally, but the point of this, if I can drag on Yeah, that is showing.

And also this, but then probably this over the top. Apologies, my mouse is very small to see on here.

Setting Up the Demo Tasks

Okay, so at the moment the database is completely empty. There's no node, it's not seeing any tasks. But I've got it configured with a little demo system so that it can point at a set of tasks.

Yes, there we go. And what it's going to be doing is just translating a whole bunch of documents in a database. file system.

But each of those documents need specific context as well. So it's going to be a legal document that has specific terminology that it will need to source in. And this is just a brief concept to make sure that all the tools and all the context can be loaded in by itself.

Execution Flow and Tool Loading

So we're going to see Claude operate as the executor for that. So at first, it's going through all the possible files. And we have Claude pop up after it's found a task.

Yes. It goes very fast. It wasn't necessarily built for demoing.

Don't expect to know what's going on. And so you can see it's loaded all the specific prompts. And the thing I pressed OK to just before was it loading specific MCP tools. as well, so that it's bringing in legal terminology to this document.

From Files to Graph: End-to-End Pipeline

So it's basically demonstrating just that it can go through an entire file system, and you can do this with a code base, to transform documents and source code, in the case of UI testing, into representations, put that in a graph database, and you build this RAG graph system, which is going to be much faster for writing your test code.

At the moment I managed, well, to turn the auto permissions off, so I'm going to have to allow a few things. But usually it just runs through and does its thing.

And we can see that it's populated a task database behind. If I can see that. I want to refresh the data.

We'll have a few nodes in here, just of a small system. There we go, small system.

So all these orange nodes represent tasks. This is a plug-in, an agent, essentially, that is telling it what to do within the system. So the orchestration system is agnostic. It can do any task.

It depends how I configure a different plug-in. And then Cloud Code is the executor, but it's going ahead performing that. It's in process because I haven't clicked through, but it will be complete.

Orchestrator Recap

But we'll move on from the orchestrator. That's a big part of what's going on.

I don't know how I get back to my Firefox tab. I can't see the tabbing anywhere. I'm just guessing.

There we go. Okay.

And there's one last slide for you guys. Okay, interesting.

What the System Delivers

We've got a few extra things in there. So what is the system?

From Codebase to Test-Ready Knowledge Graph

It's an orchestration system that builds RAG databases that represent UI and UX flows. And what you can do is use those to then write tests.

So if you have untested code and you would like to give it a try, if something would be possible to do in the next month or so,

How Engagement Works

And so the way it works is, if we're working with you guys, we'd essentially have the system come in as a collaborator on the GitHub project. And then from that, you generate this knowledge graph database of your code, all the representations we'd need.

Authoring Tests from Natural Language

We'd be able to ideate some tests automatically as well, so a set of tests that would be suitable for the system, but then from there, all your QA guys and your devs, if they want to be involved in the testing, can just take descriptive test cases, you know, I want to deposit 500 pounds in my test bank account, would be immediately transformed into playwright code that formed a compendium of, yeah, a larger test regression data,

And there you go. That's the system.

Closing

You can find us at testnexus.ai.

There's a couple of us working on it, a couple in Glasgow and a few in Cambridge.

Thanks and Contact

Thank you.