Browsing by Author "Bulling, Andreas (Prof. Dr.)"
Now showing 1 - 5 of 5
- Results Per Page
- Sort Options
Item Open Access Analysis and modelling of visual attention on information visualisations(2024) Wang, Yao; Bulling, Andreas (Prof. Dr.)Understanding and predicting human visual attention have emerged as key research topics in information visualisation research. Knowing where users might look provides rich information on how users perceive, explore, and recall information from visualisations. However, understanding and predicting human visual attention on information visualisations is still severely limited: First, eye tracking datasets on information visualisations are limited in size and viewing conditions. They usually contain hundreds of stimuli, but thousands of stimuli are usually required to train a well-generalised deep-learning model to predict visual attention. Moreover, top-down factors such as tasks strongly influence human visual attention in information visualisations. However, eye tracking datasets on information visualisations required more viewing conditions for a thorough analysis. Second, computational visual attention models do not perform well on visualisations. Information visualisations are fundamentally different from natural images, as they contain more text (e.g. titles, axis labels or legends) as well as larger areas with uniform colour. Computational visual attention models can predict attention distributions over an image, i.e. saliency maps, without the need for any eye tracking equipment. However, current visual attention models are primarily designed for natural images and cannot generalise on information visualisations. The thesis aims to investigate human visual attention on information visualisations and develop computational models for predicting saliency maps and visual scanpaths. To achieve this goal, the thesis with five scientific publications progresses through four key stages: First, the thesis addresses the scarcity of visual attention data in the field by collecting two novel datasets. The SalChartQA dataset contains 6,000 question-driven saliency maps on information visualisations, while the VisRecall++ dataset contains users' gaze data from 40 participants with their answers to recallability questions. Second, based upon the collected and public visual attention data, the thesis investigates multi-duration saliency of different visualisation elements, attention behaviour under recallability and question-answering task, and proposes two metrics to quantify the impact of gaze uncertainty on AOI-based visual analysis. Third, building upon these insights, two visual attention and scanpath prediction models are proposed. VisSalFormer is the first model to predict question-driven saliency, outperforming existing baselines under all saliency metrics. The Unified Model of Saliency and Scanpaths predicts scanpaths probabilistically, achieving significant improvements in scanpath metrics. Fourth, the thesis proposes a question-answering paradigm to quantify visualisation recallability. It further establishes connections between gaze behaviour and recallability, enabling predictions of recallability from gaze data.Item Open Access Bridging cognitive and deep learning models of attention(2025) Sood, Ekta; Bulling, Andreas (Prof. Dr.)Neural attention mechanisms, drawing inspiration from the cognitive modeling of human attention, have led to significant advancements in deep learning models across the fields of computer vision (CV) and natural language processing (NLP) (Gupta et al., 2021). Despite these technological strides, AI models still fall short of human performance in tasks demanding nuanced comprehension (e.g., reading comprehension), as well as in out-of-the-box data domains and novel modalities (Sarker, 2021). The goal of this dissertation is to bridge human and data-driven models of attention to enhance the performance of neural systems for CV and NLP tasks. We hypothesize that the human–machine performance gap is due to a lack of adequate human-like attention functionalities in AI systems, given the relationship between attention functionality and task performance in humans (Pashler et al., 2001). To address this gap, we focus on three aspects that currently hamper the performance of attention-based deep neural networks (DNNs) (Kotseruba et al., 2016). First, the lack of interpretability, obscuring our knowledge of how these models process and prioritize information. Second, the challenge of generalizability across datasets and domains. Third, the substantial data dependency, hindering the development and scalability of the models for certain tasks. We explore if we can mitigate these issues by integrating DNNs with cognitive models of attention, especially for the tasks of reading and scene perception, where human attention has been widely studied and where DNNs fall short of human capabilities (Das et al., 2017; Mathias et al., 2021). Accordingly, the manuscript develops along three research questions. The first is: What is the relationship between neural and human attention? Focusing on reading comprehension tasks, we uncover correlations between models and human-like attention on reading comprehension tasks. Our findings demonstrate that: a closer alignment with human attention patterns can in fact significantly improve DNNs task performance in both mono- and multimodal settings; that there is a trade-off between model complexity and attention-based interpretability; and that specifically text attention is significantly correlated to model accuracy. Second, we ask: How does incorporating cognitive theories of attention into DNNs enhance model generalizability? We illustrate that using cognitive simulations as an inductive bias, along with specialized training, effectively compensates for the absence of human ground truth attention data in novel domains. We pioneer a method (known as deep saliency prediction (Wang et al., 2021)) to initiate training a DNN for visual saliency prediction by using cognitive model simulations as an inductive bias. Our text and image saliency models, informed by generalized eye movement behaviors simulated from cognitive models, are further refined with limited eye-tracking data, achieving significant performance improvements comparable to the state of the art across various domains and datasets. Lastly, our third research question is: Can methods informed by cognitive models of attention effectively mitigate data dependency requirements? We apply our saliency prediction model in mono- and multimodal NLP tasks using a novel joint semi-supervised training method: we generate task-specific human-like attention by training our downstream task models and allowing for gradient flow in the saliency prediction model. Hence, we supervise neural attention layers of different downstream DNNs with different saliency predictions from the same model. This way, by supervising neural attention mechanisms with human-like attention, and jointly training both models for a given task end-to-end, we circumvent the need for task-specific human data. Put together, our studies set forth a structured approach towards addressing key limitations of current data-driven deep learning models of attention. This thesis demonstrates that integrating them with cognitive science frameworks of human attention opens up new research possibilities, allowing to obtain models that are more efficient, more aligned with human cognitive processes, and that better perceive and understand the world in a human-like manner.Item Open Access From mind to machine: leveraging gaze behaviour and user feedback for mental face reconstruction(2025) Strohm, Florian; Bulling, Andreas (Prof. Dr.)With the ever-growing prevalence of intelligent systems, these systems must understand users' mental states to effectively and safely assist and interact with them. While many aspects of modelling human mental states, particularly emotions and cognition, have been extensively studied, the systems' ability to comprehend users' mental imagery - a vital component of human planning and action - remains underexplored. In this thesis, we examine the task of computational mental image reconstruction (MIR), situated at the intersection of artificial intelligence (AI) and human cognition, to decode and recreate image representations held in the mind's eye. Specifically, we focus on mental face reconstruction, which has been extensively studied within forensics, offering a controllable and well-defined image space to investigate this challenging task. This thesis proposes various methodologies aimed at enhancing mental face reconstruction by addressing the limitations of prior work: (1) reducing reconstruction times and users' mental workload by simplifying the task with interactive AI systems; (2) allowing users to manually adjust and refine reconstructions by providing semantic and fine-grained control over reconstructed images; and (3) investigating implicit behavioural signals, such as human gaze data, for MIR. This work explores two primary concepts for enhancing MIR systems: one that utilises explicit user feedback and control and another that focuses on gaze-based approaches. The first concept centres on using explicit user feedback in facial reconstruction. We developed an intelligent system where users rank sets of faces based on their similarity to the mental image they envision. Our system uses this ranking information to infer and generate the mental image the user has in mind, significantly reducing the user's workload and reconstruction time. While this method can produce faces that are visually similar to the users' mental images, accurately reconstructing fine facial features remains challenging. To address this, we introduced a novel method called UP-FacE, which gives users fine-grained and semantic control over various facial features. We created a tool with a simple slider interface that allows users to refine and fine-tune faces predicted by mental face reconstruction systems. The second concept explores leveraging human gaze behaviour, which has been shown to encode valuable information about users' mental states but has not been widely investigated for MIR. To study the feasibility of gaze-based image reconstruction methods, we proposed a novel approach operating in a controlled environment with human-like faces. This demonstrated our ability to extract valuable information from users' gaze behaviour and subsequently reconstruct mental face images. Since this method required prior knowledge unavailable during test time, we extended this work into an interactive system where users and AI iteratively collaborate to infer the user's mental face image without prior knowledge. The main limitation to successfully applying gaze-based methods for accurate reconstructions is the lack of task- and user-specific gaze data. A promising solution is to employ user models that simulate gaze behaviour during training. However, existing methods typically predict gaze data for an average user, ignoring individual differences in gaze behaviour. Given the user-specific nature of MIR, we developed a novel method that learns user embeddings from a small amount of gaze data, allowing us to synthesise user-specific visual attention. Further improving user models to simulate gaze behaviour is crucial for training effective, gaze-based mental face reconstruction systems. Our proposed reconstruction systems aim to reduce the friction associated with MIR, a critical factor for effective and pervasive human-AI interaction. Furthermore, our insights into gaze-based MIR suggest potential future methodologies that could further reduce this friction, thereby enhancing the effectiveness of human-AI interactions.Item Open Access Learning representations of interactive behaviour(2025) Zhang, Guanhua; Bulling, Andreas (Prof. Dr.)Item Open Access Methods and applications for multimodal conversational models(2025) Abdessaied, Adnen; Bulling, Andreas (Prof. Dr.)