Simulating Interactive User Behaviour via Reinforcement Learning

Introduction

Hi all.

Thank you so much for having me today. It's a great opportunity and a great event bringing research and practitioners and industry together. So thanks for the invitation.

I will talk a bit about my research today. So my name is Florian Fischer.

I am a postdoc research associate in the Intelligent Interactive Systems group at the University of Cambridge. And my research is mainly about modeling, predicting, and improving human-computer interaction.

Overview of Human-Computer Interaction

So what does that mean?

Human computer interaction, that's a kind of official definition by the committee, is a discipline concerned with the design, evaluation, and implementation of interactive computing systems for human use. And that's the essential bit.

So it's not only about finding more aesthetically appealing design designs or layouts and these kind of things but mainly putting the user in the focus and that's also why some of you might be familiar with and we have that user-centered design process both in as a main methodology for research but also for designers and practice which put the user at the beginning and also at the center of all the following iterative steps.

So we first need to understand the context in which the system is to be deployed, specify the user requirements, and then come up with design solutions that again need to be evaluated against what the target group or the users really need.

User-Centered Design Process

So for coming up with these design solutions, we have some, for example, guidelines, rough ideas of what a good design UI UX case might look like. However, the design space changes.

So in this traditional device era, we had stationary devices, mobile devices, desktop computers, tablets, smart watches, which kind of constrained and limited the possibilities the design space offers. For example, in terms of the Windows screen, in general, the Windows constrained and limited what can be outputted by an application, but also the input methods were kind of clearly defined.

But now that we're entering that spatial error, not saying that the devices will not be important anymore, but an error which offers much more opportunities as it allows users to directly interact with virtual objects in the same way as physical objects. So touching them, grasping them, playing around with them, maybe even speaking to them, offers an basically infinite design space that is much harder to cope with.

Research Focus

So what we propose and what I think is needed what we're currently working on, are computational models that allow us to predict and understand how users interact with technology, with interfaces.

And there are, during the last 10, 20 years, a lot of different kind of models that have been proposed in human-computer interaction in different areas.

For the sake of time, I won't go into detail here, but we'll focus my talk on another aspect.

Body Movement and HCI

If you have a look at all these different interaction techniques, both traditional ones and also ARVR usage, they all have something in common. They're all fundamentally based on human body movement. And that's an aspect that traditionally has been neglected in at least HCI, human computer interaction research. But it's important to have a closer look at that because body movement data conveys a lot of meaningful information, not only about the user's intent, but also about their uncertainty in doing a certain task or exercise and their well-being.

For example, it can be used to infer uncomfortable body postures early on or measure and adapt to physical fatigue. So predicting and understanding these body movements has untapped potential for system design. And the main question is, how can we understand and predict how users move in interaction?

And the method or the approach I'm going to present today consists of two components. The first one is biomechanical body models.

And there has been over the last decades a lot of body models being proposed. Models of different complexity and different scopes.

So the most simple ones basically model users as a rigid body with with simple joints connected to another, more advanced models are more closer to the real physiology. We have different scopes, some of them can only predict movement like the kinematics, others are also able to estimate or predict the underlying neuromuscular skeletal system. And we're focusing on that musculoskeletal level as it conveys a lot of meaningful information.

Biomechanical Models

So we're using this state-of-the-art model. It's a muscle actuated model of the upper extremity, that means of the arm, a model of the shoulder, the elbow, and the wrist. Could also more advanced version or recent versions also have a model of the head or the neck or the torso. So it can be in theory extended to a fully skeletal model.

It models all the bones, muscles, tendons and ligaments. And depending on whether we include the wrist and the finger or only model the arm and the shoulder, which again depends on the task we're about to simulate or to model, this might include up to 50 muscles.

So resulting in a highly complex model that is not not easy to use or to solve problems with, as we'll show later.

Approaches to Simulations

And there are fundamentally two approaches of how these body models can be used for simulation. So one is inverse, the probably more traditional approach, and the other is forward.

Inverse simulation is the data-based approach. So we usually start with marker data, could stem from motion capture data, or directly inferred from videos, computer vision. tools and we use that marker data to infer like trace back which joint angles might have resulted in those movements and also on a dynamic level which forces we assume have been applied to the joints or even with muscle commands must have been actuated.

What we are more interested in because it allows us to build non-data-based approaches is the other direction, forward simulations. where we start with a muscle control signal and then can forward simulate which the resulting body movement will look like.

Muscle Control Signals

And the main question is how or which is the muscle control signals that users choose in interaction. And that's the second part where we need models of user behavior, effectively cognitive models, though the granularity or the realism of those models might differ. So that's computational modeling.

And as a basic framework to understand what the idea here is, We're using the idea of optimal feedback control. I also won't go too much into detail here.

That's basically a mathematical framework for time continuous control problems, but it has also been proposed as a theory for how users control their movements. 20 years ago. So just as a quick idea of optimal feedback control, it consists of three assumptions.

The first one is users are able to continuously control a given system, for example an application on a laptop controlled by a mouse device, through their body movements. And they're also able to continuously observe the effect that their movements have on the system through feedback. That could be visual feedback. Also since body movement is involved, proprioception plays an important role.

And this continuously obtained feedback is then used to choose the muscle controls such that they optimize task-relevant objectives. And that's the key point.

This brings us to a constraint optimization problem, which might even be harder to set up or to define than to solve. I will come back to that in a second.

Biomechanical User Simulations

1This one combined with the biomechanical model or the constraints imposed by the biomechanical model and the constraints imposed by the environment gives us this idea of biomechanical user simulations. We are using the model we have with a shoulder, elbow and arm. We add literally an eye to it to give it visual perception and also have the proprioceptive input. So that's the quite high dimensional observation space.

And then use reinforcement learning policy grading methods such as PPO to learn a control policy, which effectively is a mapping from, at every time step, the visual and proprioceptive and sensory information received to the control signals such that it solves a certain task.

and quickly show what the basic idea of this biomechanical RL is, the basic recipe. So first we need to define the task-specific rewards, what the user actually wants to achieve, what their goals are. The second thing is also we need to come with an appropriate learning architecture, which basically defines the capabilities that the simulated user have, basically a model of their brain.

We initialize the policy with random weights. I will show you a video that would be quite funny, just moving randomly around like a user or a human that has not learned to control their body. And then it's about alternatively applying steps four and five.

Four is basically interacting with the task by exploring different muscle control policies, so generating rollouts and exploring what is possible, observing which are more beneficial by obtaining the rewards and then updating the control strategy, the weights to learn from that. And doing that alternatively until convergency might, if the problem is well posed, result in a task or user specific policy.

Practical Applications

So we've tested this, started with a range of basic visual motor tasks that are also relevant for interaction, for example in VR and AR. So that classical mid-air pointing task where a target of a given size is presented in mid-air in a 3D space and then needs to be reached and the fingertip needs to be kept in there was used as a baseline, as a starting point, and we can simulate that fairly well.

But it also works for tracking a randomly moving target, for visual motor coordination tasks, for example here, where the task is to click the button of the color that is currently presented on the display in front of the user, and also more complex high-dimensional interaction tasks. For example here, using a joystick on a gamepad to steer a remotely controlled car into that parking lot and then letting it stay inside.

And we're not only able to demonstrate that it somehow generates solutions that work, that achieve the desired performance, but also that the movements that are predicted by this biomechanical RL approach follow well-established movement laws. even though those weren't directly implemented. That's just the result of the constraints imposed by the system and then producing a policy that aligns with the system and with its constraints.

Modular Simulation Framework

So what we've developed is that modular simulation framework could use in the box where you can combine different musculoskeletal models one or multiple perception modules like vision and perception and a definition of the task including the rewards that the user is supposed to solve and then use reinforcement learning to generate those simulated users that can interact with the task.

And we've extended this to demonstrate how we can integrate user simulations into VR design. So I won't go too much into detail here, but it's called Sim2VR. And it basically allows the simulated user to see and control the exact same VR environment as real users would do.

So it's really training and interacting with the same environment, just need to augment it by the rewards. that we assume a real user would internally use to solve that constraint optimization problem and then can predict user movements.

Sim2VR and User Strategy Prediction

For example, we've used that for this classic whack-a-mole game where the task is to hit all the randomly appearing target boxes on that plane as fast as possible. And this can, for example, be used to test different variants of the game.

So we had one constrained one, where a minimum velocity was required to successfully hit and destroy these targets and an unconstrained variant and test and train two different simulated users and they learned to use different strategies.

So in particular the unconstrained variant where it was not necessary to reach back and gain momentum just learn to stay inside the target, which in a separate user study also some of the users, even though we didn't prime them, obtained. 1So we are able to predict user strategies, maybe also flaws in design of applications and games by using those simulated users.

Conclusion

So SynthVR allows to run biomechanical simulations directly in VR, something I have not shown now, but we are also able to demonstrate it can predict differences in user performance and ergonomics, and it can reveal potential user strategies. And thus constitutes the first key step towards what we call automated biomechanical testing, which might be a useful tool or method, especially in early stages of the design process.

So to summarize, we've seen not only most interaction is based on movement, but biomechanical simulation can help understanding these movements. Of course, there are also some challenges associated with that approach.

Challenges and Future Work

First one, I already tackled it a few times, is the reward function design. So currently, a paper that will be published in September is about guidelines that help users of the tool, of that method, to come up with reasonable reward function approaches. But still, it's a lot of trial and error at the moment, and we need better approaches to generalize it to more tasks.

Also, the lack of diversity in biomechanical user models is a problem. Or another way around, if we had better models or more diverse models also of elderly, of people with different impairments, that would open much more opportunities in terms of design for accessibility.

There's also a kind of lack of robustness and generalizability of learned policies across tasks. Currently, or from what I've presented here, we train all the policies from the scratch. The human needs to learn every time again how to move the entire arm or hand.

We are also looking for better methods to generalize that and to really propose this as a tool also for designers and researchers. We need to lower the entry barrier to make that usable for people without much experience in reinforcement learning and biomechanical simulations.

But on the other hand, there are a lot of benefits that these can provide.

Benefits and Implications

For HCI, from a researcher's perspective, it means we could better understand movement interaction using the simulations, validate and test existing models and theories, and analyze the effects of individual model parameters.

From a designer's perspective, it might allow to predict movements, ergonomics, and different user strategies for their design parameters, which could also be used in an interactive or automated way to develop customized and adaptive interfaces, and as I mentioned, increase accessibility and usability.

But then of this general approach, There might be even more benefits, which you also see in related fields, for example, neuroengineering, build assistive devices, if we understand how users control work, for sports, analyzing strategies of athletes, or for rehabilitation, for customized trainings.

And there might be much more. And yeah, looking forward to discuss what you foresee in this approach.

So thanks for your attention.

Finished reading?