Lennart Wachowiak contributes to IEEE and Springer journals with research on Human–Robot interactions

10th April 2025 | Student News

News > Lennart Wachowiak contributes to IEEE and Springer journals with research on Human–Robot interactions

In his PhD, STAI CDT student, Lennart Wachowiak is researching how robots can explain their actions to their users. Explainability is a core pillar of safe and trusted AI research as it can help with understanding why things went wrong and whether decisions were made for the right reasons. This feeds into how a human can judge whether to trust it or not. Lennart’s research focuses especially on the moments during human–robot interactions when an explanation is needed, and how robots can recognize those moments and offer explanations on their own.

In line with this research goal, Lennart developed a technical solution allowing robots to detect good explanation timings, which he published in IEEE Transactions on Affective Computing. In this paper, he trained time series classifiers to detect whether a user is confused or an AI agent makes an error — two situations that would warrant an explanation from the robot. 

A time series classifier is a type of machine learning model that looks at data that changes over time — like a video, heart rate monitor, or someone’s eye movements — and tries to recognize patterns or make predictions based on that timeline. In Lennart’s case, he used time series classifiers to analyze signals over time (like where someone is looking or how they’re behaving during a task) to figure out if something seems off — for example, if a person looks confused or if the robot might have made a mistake. Based on those patterns, the system can decide whether it’s a good moment for the robot to explain what it’s doing.

The collection of the data set used to train the machine learning model was a joint effort with STAI CDT student, Peter Tisnikar. For the task environment they used a virtual, collaborative cooking game. The classifier takes as input the user’s gaze behavior. After training, the model picks up on gaze patterns such as users looking more at the agent when it makes mistakes and users looking around the environment when not knowing how to solve the task themselves. 

Picture shows STAI CDT student Peter Tisnikar demonstrating the virtual, collaborative cooking game.

As such an approach relies on the labor-intensive process of manually collecting data sets for different domains and users, Lennart evaluated the zero-shot capabilities of foundation models for deciding how to communicate during critical situations in human–robot interactions. Zero-shot means seeing how well an AI model performs on a task that it is not trained to do.

Foundation models for vision–language tasks are pre-trained on internet-scale data sets and, thus, can generate image- or video-related text without needing to be fine-tuned for a specific domain. However, when being prompted to choose between different communicative options, given videos of human–robot interactions, vision–language models show a lack of understanding of the social and spatial concepts necessary to choose the right style of communication and knowing when to explain. 

He presented this project in October at IEEE/RSJ IROS, and you can read the paper on KCL Pure

Lastly, one cornerstone of doing a PhD is engaging with the existing literature, building on theories, and finding space to innovate. Lennart conducted a scoping literature review on explanations robots give in human–robot interaction scenarios. Specifically, he focused on the explanation types and timings discussed in the literature and presented them in the form of a taxonomy. He hopes it will make it easier for other researchers to jump into the field of explainable robotics. You can read his study in the Springer Journal for Social Robotics.

Lennart’s taxonomy