Repository logoOPUS - Online Publications of University Stuttgart
de / en
Log In
New user? Click here to register.Have you forgotten your password?
Communities & Collections
All of DSpace
  1. Home
  2. Browse by Author

Browsing by Author "Ma, Yingpeng"

Filter results by typing the first few letters
Now showing 1 - 1 of 1
  • Results Per Page
  • Sort Options
  • Thumbnail Image
    ItemOpen Access
    Analysing human vs. neural attention in VQA
    (2024) Ma, Yingpeng
    Visual Question Answering (VQA) has drawn substantial interest in both academic and industrial research fields in recent years. Driven by Vision Transformers (ViT) and the vision-text co-attention mechanism, these models have shown notable performance improvement. Yet, the black-box nature of neural attention impedes people from understanding its functionality and establishing their trustworthiness. Drawing inspiration from various scholars and their contributions, this thesis demystifies these mechanisms. We aim to 1) extract the neural attention weights of VQA models, 2) remap the weights to machine attention maps, 3) compare machine attention with human gazing heatmaps, and 4) compute the related metrics to provide deeper insights into the attention patterns. First, the attempts to reproduce the MCAN model implementation and machine attention extraction on the VQA-MHUG dataset are performed on the MULAN framework. Through a comparison with official implementations, the accuracy and correctness of the re implementation have been verified. Then, utilizing the toolkit of the MULAN framework, the 1D attention weights are remapped to 2D neural attention maps. Next, these attention maps are compared to human-gazing heatmaps of VQA-MHUG using explainable AI (XAI) metrics. Following the above pipeline, another experiment on the AiR-D dataset is conducted and reports the Area Under ROC Curve (AUC), Spearman’s rank correlation coefficient (rho), and Jensen-Shannon Divergence (jsd) metrics to compare the neural attention with the human gazing heatmaps. Finally, the discussion of the differences between the official and re-produced implementations is presented alongside insights on the interpretability of neural attention in VQA models.
OPUS
  • About OPUS
  • Publish with OPUS
  • Legal information
DSpace
  • Cookie settings
  • Privacy policy
  • Send Feedback
University Stuttgart
  • University Stuttgart
  • University Library Stuttgart