Analysis and modelling of visual attention on information visualisations
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Understanding and predicting human visual attention have emerged as key research topics in information visualisation research. Knowing where users might look provides rich information on how users perceive, explore, and recall information from visualisations. However, understanding and predicting human visual attention on information visualisations is still severely limited: First, eye tracking datasets on information visualisations are limited in size and viewing conditions. They usually contain hundreds of stimuli, but thousands of stimuli are usually required to train a well-generalised deep-learning model to predict visual attention. Moreover, top-down factors such as tasks strongly influence human visual attention in information visualisations. However, eye tracking datasets on information visualisations required more viewing conditions for a thorough analysis. Second, computational visual attention models do not perform well on visualisations. Information visualisations are fundamentally different from natural images, as they contain more text (e.g. titles, axis labels or legends) as well as larger areas with uniform colour. Computational visual attention models can predict attention distributions over an image, i.e. saliency maps, without the need for any eye tracking equipment. However, current visual attention models are primarily designed for natural images and cannot generalise on information visualisations. The thesis aims to investigate human visual attention on information visualisations and develop computational models for predicting saliency maps and visual scanpaths. To achieve this goal, the thesis with five scientific publications progresses through four key stages: First, the thesis addresses the scarcity of visual attention data in the field by collecting two novel datasets. The SalChartQA dataset contains 6,000 question-driven saliency maps on information visualisations, while the VisRecall++ dataset contains users' gaze data from 40 participants with their answers to recallability questions. Second, based upon the collected and public visual attention data, the thesis investigates multi-duration saliency of different visualisation elements, attention behaviour under recallability and question-answering task, and proposes two metrics to quantify the impact of gaze uncertainty on AOI-based visual analysis. Third, building upon these insights, two visual attention and scanpath prediction models are proposed. VisSalFormer is the first model to predict question-driven saliency, outperforming existing baselines under all saliency metrics. The Unified Model of Saliency and Scanpaths predicts scanpaths probabilistically, achieving significant improvements in scanpath metrics. Fourth, the thesis proposes a question-answering paradigm to quantify visualisation recallability. It further establishes connections between gaze behaviour and recallability, enabling predictions of recallability from gaze data.