A Systematic Explora-
tion of Uncertainty in 
Interactive Systems
Miriam Greis

A SYSTEMATIC EXPLORATION OF
UNCERTAINTY IN INTERACTIVE SYSTEMS
Von der Fakultät für Informatik, Elektrotechnik und
Informationstechnik und dem Stuttgart Research Centre for
Simulation Technology (SRC SimTech) der Universität
Stuttgart zur Erlangung der Würde eines Doktors der
Naturwissenschaften (Dr. rer. nat.) genehmigte Abhandlung
vorgelegt von
MIRIAM GREIS
aus Esslingen am Neckar
Hauptberichter: Prof. Dr. Albrecht Schmidt
Mitberichter: Prof. Dr. Orit Shaer
Tag der mündlichen Prüfung: 21. Dezember 2017
Institut für Visualisierung und Interaktive Systeme
der Universität Stuttgart
2017

Abstract iii
ABSTRACT
Uncertainty is an inherent part of our everyday life. Humans have to deal with
uncertainty every time they make a decision. The importance of uncertainty addi-
tionally increases in the digital world. Machine learning and predictive algorithms
introduce statistical uncertainty to digital information. In addition, the rising num-
ber of sensors in our surroundings increases the amount of statistically uncertain
data, as sensor data is prone to measurement errors. Studies in Psychology have
revealed that humans prefer to receive information about uncertainty and make
better decisions if uncertainty information is communicated. Furthermore, an
adequate communication of uncertainty establishes trust in an application. Hence,
there is an emergent need for practitioners and researchers in Human-Computer
Interaction to explore new concepts and develop interactive systems able to handle
uncertainty. Such systems should not only support users in entering uncertainty
in their input, but additionally present uncertainty in a comprehensible way.
The main contribution of this thesis is the exploration of the role of uncertainty
in interactive systems and how novel input and output methods can support
researchers and designers to efficiently and clearly communicate uncertainty.
By using empirical methods of Human-Computer Interaction and a systematic
approach, we present novel input and output methods that support the comprehen-
sive communication of uncertainty in interactive systems. We further integrate
our results in a simulation tool for end-users.
First, we identify functional and non-functional requirements for the interaction
with uncertainty in the context of an end-user simulation tool. We conduct
multiple surveys, focus groups, and design workshops with simulation experts
and end-users. The results of our studies show that delivering designs that cater
to different users and application scenarios, and transparency of used algorithms
and data, play an important role. Both can only be achieved by an adequate
quantification and communication of uncertainty.
Based on related work, we create a systematic overview of sources of uncertainty
in interactive systems to support the quantification of uncertainty and identify
relevant research areas. The overview can help practitioners and researchers to
identify uncertainty in interactive systems and either reduce or communicate it.
We then introduce new concepts for the input of uncertain data. We enhance
standard input controls, develop specific slider controls and tangible input controls,
and collect physiological measurements. For each of these explorations, we
conduct surveys and studies to identify and propose the most promising candidates
for future usage. We also compare different representations for the output of
iv Abstract
uncertainty to make recommendations for their usage. Furthermore, we analyze
how humans interpret uncertain data und make suggestions on how to avoid
misinterpretation and statistically wrong judgements.
We embed the insights gained from the results of this thesis in an end-user
simulation tool to make it available for future research. The tool is intended to
be a starting point for future research on uncertainty in interactive systems and
foster communicating uncertainty and building trust in the system.
Overall, our work shows that user interfaces can be enhanced to effectively
support users with the input and output of statistically uncertain information.
Zusammenfassung v
ZUSAMMENFASSUNG
Unsicherheit war und ist ein inhärenter Teil des Alltags und Menschen werden da-
mit täglich konfrontiert, wenn sie Entscheidungen treffen. Zunehmend spielt das
Thema Unsicherheit auch eine immer größere Rolle in der digitalen Welt. Durch
die Entwicklung von Algorithmen, die maschinell lernen und Vorhersagen treffen,
sind viele digitale Informationen statistisch mit Unsicherheit behaftet. Auch die
wachsende Anzahl an Sensoren in der Umgebung führt zu einem Zuwachs an
statistisch unsicheren Daten, da die Sensordaten anfällig für Messungenauigkei-
ten sind. Studien in der Psychologie haben bereits gezeigt, dass Menschen es
bevorzugen, über Unsicherheit informiert zu werden, und besser Entscheidungen
treffen, wenn Informationen über die Unsicherheit zur Verfügung stehen. Auch
das Vertrauen in Informationen kann durch eine adäquate Kommunikation der
Unsicherheit gesteigert werden. Gerade deshalb ist es umso wichtiger, dass Ent-
wickler und Forscher in der Mensch-Computer-Interaktion Konzepte erforschen
und interaktive Systeme entwickeln, die mit Unsicherheit umgehen können. Diese
Systeme sollen dem Benutzer einerseits erlauben Unsicherheit bei der Eingabe zu
kommunizieren und andererseits Unsicherheit auf verständliche Weise darstellen.
Der Hauptbeitrag dieser Arbeit liegt in der Erforschung, inwiefern Unsicherheit
in interaktiven Systemen eine Rolle spielt und wie neue Eingabe- und Ausga-
bemethoden dazu beitragen können, Unsicherheit effizient und verständlich zu
kommunizieren. Mit Hilfe des Einsatzes von empirischen Methoden der Mensch-
Computer-Interaktion und einer konstruktiven Vorgehensweise stellen wir neuar-
tige Eingabe- und Ausgabemethoden vor, die die verständliche Kommunikation
von Unsicherheit in interaktiven Systemen unterstützen. Unsere Erkenntnisse
integrieren wir in einem Simulationswerkzeug für Endnutzer.
Zunächst identifizieren wir notwendige funktionale und nicht-funktionale Vor-
aussetzungen für die Interaktion mit Unsicherheiten im Kontext eines Simulati-
onswerkzeugs, welches für Endnutzer geeignet ist. Dafür führten wir mehrere
Umfragen, Fokusgruppen und Designworkshops mit Simulationsexperten und
Endnutzern durch. Die Ergebnisse unserer Untersuchungen zeigen, dass Fle-
xibilität in Bezug auf unterschiedliche Benutzer und Anwendungsfälle sowie
Transparenz von verwendeten Algorithmen eine große Rolle spielen. Beides
kann jedoch nur über die adäquate Quantifizierung und Kommunikation von
Unsicherheit erreicht werden.
Um die Quantifizierung von Unsicherheit zu unterstützen und relevante For-
schungsbereiche zu identifizieren, stellen wir basierend auf verwandten Arbeiten
eine systematische Übersicht über Quellen von Unsicherheit in interaktiven Sys-
vi Zusammenfassung
temen vor. Diese Übersicht kann zukünftig Forschern und Entwicklern helfen,
Unsicherheit in interaktiven Systemen zu identifizieren und entweder zu reduzie-
ren oder zu kommunizieren. Darauf aufbauend präsentieren wir neue Konzepte für
die Eingabe von unsicheren Daten. Wir erweiterten dazu Standardeingabemetho-
den, entwickelten spezifische Slider, begreifbare Eingabemethoden und erhoben
physiologische Messungen. Zu jedem Teilbereich führten wir Umfragen und
Studien durch, um die Nutzung der vielversprechendsten Eingabemethoden zu
empfehlen. Auch für die Ausgabe von unsicheren Daten vergleichen wir verschie-
dene Repräsentationen, um Empfehlungen zu deren Verwendung auszusprechen.
Des Weiteren untersuchen wir, wie Menschen unsichere Daten interpretieren,
und sprechen Empfehlungen zur Verhinderung von Missinterpretationen bzw.
statistisch falschen Interpretationen aus.
Die Ergebnisse dieser Arbeit werden im Rahmen eines Simulationswerkzeuges
für Endnutzer aufgegriffen und für zukünftige Forschung aufbereitet. Dieses
Werkzeug soll als Basis für zukünftige Forschung über Unsicherheit in interak-
tiven Systemen dienen, um die verständliche Kommunikation von Unsicherheit
und das Vertrauen in interaktive Systeme zu fördern.
Insgesamt zeigt diese Arbeit, dass Benutzungsschnittstellen sinnvoll so erweitert
werden können, dass Benutzer bei der Ein- und Ausgabe von statistisch unsicheren
Informationen effektiv unterstützt werden können.
Preface vii
PREFACE
This thesis originated from the research that I conducted at the University of
Stuttgart in the context of a project funded via the Excellence Initiative of the
German Research Foundation. My work and decisions were influenced by many
conversations and discussions with colleagues, students, and external researchers
working on the topic of uncertainty in interactive systems. As a research associate
at the University of Stuttgart, I also supervised student projects including Bache-
lors and Masters theses. These theses were all related to my research topic and
supported me in realizing my ideas. During my whole time as PhD student, I very
much valued the scientific exchange with other researchers and practitioners when
attending conferences, workshops, or doctoral colloquiums. Hence, I decided to
write this thesis using the scientific plural instead of the singular. All figures and
diagrams in this paper were either made by myself or originated in the context of
theses completed under my supervision. Additionally, parts of the presented work
are based on scientific publications arising from collaborations with colleagues
and students. The respective chapters contain references to these publications in
the introductory part of the chapter.
viii Preface
Acknowledgments ix
ACKNOWLEDGMENTS
While my name is the only one on the front cover of this thesis, there are many
others that supported me throughout my time as a PhD student. I had the unique
opportunity to meet excellent researchers and collaborators without whom I would
not have been able to finish this thesis. I want to thank all of them for influencing
and shaping my research. Apart from that, many did not only stay collaborators
but turned into great friends who gave me the personal support I needed to keep
up and continue my research with all its ups and downs. I therefore dedicate the
acknowledgements to them and sincerely apologize to those, that I might have
missed.
First and foremost, I would like to express my special appreciation to my super-
visor Albrecht Schmidt who had faith in employing me as a research assistant
offering me the chance to pursue my PhD in his research group. He inspired
my work and always supported me in the best possible way to achieve my goals.
Thank you, Albrecht, for your continuos support and guidance, your faith in
my research despite the many paper rejections I faced during my first two years,
and your precious time whenever I needed it most. I could not have imagined
a better supervisor for my PhD thesis. Besides my supervisor, I would like to
thank my thesis committee Orit Shaer, Thomas Ertl, and Stefan Funke for
their insightful comments and questions. Thank you for your feedback, time, and
effort!
I started working as a teaching assistant in Albrecht’s research group when I was
a master student. I am very grateful that he offered me to write a diploma thesis
in his group. I would like to thank Niels Henze and Florian Alt, who supervised
my diploma thesis for making me interested in research. Without them, I would
have probably not joined Albrecht’s group to pursue a PhD.
At the University of Stuttgart, I had many excellent colleagues who supported
my research and time as a PhD student in different ways. Most thanks goes to
Paweł W. Woz´niak for his constant personal support since he joined Albrecht’s
group. Whenever I struggled or needed someone to talk, I knew that I could
call him. He mentally supported me in finishing some of the most important
milestones for this thesis and always found the most friendliest and motivating
words to encourage me to keep up my research. I found a further great collaborator
in Tonja Machulla, who supervised two students with me, which resulted in an
Honorable Mention Award on CHI’17. Special thanks also to Jakob Karolus,
not only for sharing an office with me during the last 1 1/2 years, but also
for numerous cooking and music sessions, which I will truly miss now that
x Acknowledgments
he moved to Munich. Thanks goes to Yomna Abdelrahman (for Egyptian
sweets and awesome food), Nora Broy,Mariam Hassib, Romina Kettner, and
Alexandra Voit for being the greatest roommates on conferences and seminars,
that I could possibly have. It was always a great pleasure to share a hotel room
and a conference experience with you. Thanks as well to all other colleagues
that I met during my time at the University of Stuttgart: Mauro Avila, Patrick
Bader (for randomly picking the same marriage date), Céline Coutrix (for the
collaboration on tangible interfaces), Tilman Dingler (for great music on our
Christmas parties), Passant El.Agroudy (for her endless energy),Markus Funk
(for having trust in me as a skiing teacher), Huy Viet Le (for our adventurous
visit to the zoo), Hyunyoung Kim, Francisco Kiss, Pascal Knierim, Oliver
Korn (for realizing that there are women in computer science), Thomas Kosch,
Thomas Kubitza, Lars Lischke (for being a great travel buddy on my first
CHI conference), Sven Mayer (for always fixing the repository of our students
and great barbecues), Bastian Pfleging (for his invaluable knowledge about
everything and insights on how to write a PhD thesis), Rufat Rzayev (for always
being in the mood for singing a song), Stefan Schneegaß, Valentin Schwind,
Dominik Weber, and Katrin Wolf. Special thanks goes to Anja Mebus and
Murielle Naud-Barthelmeß for doing all the administrative work and always
being receptive for problems arising in the group.
Being a member of the graduate school of the Cluster of Excellence in Simulation
Technology introduced me to many people working in different fields. First,
I want to say thank you to my project network: project network 7. Thank
you for great interdisciplinary conversations in our lunch meetings and on the
status seminars. Second, my thanks goes to Maria Hammer and Christoph
Grüninger for sharing the honor of being PhD spokespersons during the first
year of my PhD and organizing a great PhD weekend! I further want to thank
Mark Dornbach, Dennis Grunert, and Andreas Schmidt for our informal
lunch meetings on Thursdays. What started as a meeting with PhD students
from other faculties lead to a great exchange of experiences and a friendship
that I do not want to miss anymore. I additionally want to thank the whole
SimTech management team, especially Barbara Teutsch who did a great job
in managing the graduate school. Thank you, Barbara, for always having an open
door no matter what questions or problems I had.
There are many other great people that I am grateful for getting to know during
my PhD. First of all Chris Schmandt and his awesome group at the MIT Media
Lab, who made it possible for me to visit them for three months. I enjoyed my
stay very much experiencing a different culture of research. Thank you, Chris,
for being such a supportive supervisor sharing your great knowledge and stories
Acknowledgments xi
about the beginnings of the Media Lab. On the numerous conferences I visited,
I met many other great and inspiring researchers. I want to thank: Jonathan
Day (for being the best SV Co-Chair that I could have ever picked) and all our
great SVs of UbiComp’16, Pascal Lessel (for the shared experience of four CHI
conferences and for always being an awesome seatmate on the return flights),
Marion Kölle (for being my roommate at the winter school and on NordiCHI’16
and for the small surprise package after my defense), Thijs Roumen (for the great
organization and success of the German party on CHI’17), Christian Löw (for
being a great motivator on MobileHCI’17),Michael Lahnert (for sharing some
of the painful experiences of being a PhD student),Matthew Kay (for supporting
my idea of a CHI workshop on uncertainty), all organizers and participants who
made the uncertainty workshop happen on CHI’17 and all the others whose names
do not fit in the limited space of these acknowledgements.
This thesis would have not been possible without the support of great students.
Thanks to my two student research assistants Hendrik Schuff and Ken Singer
for their great and reliable work over multiple years. I additionally supervised
multiple bachelor and master theses to shape my research. Thanks to Emre
Avci, Velihan Bulut, Patrick Franczak,Marius Kleiner, Andreas Korge, and
Thorsten Ohler for being great students that I really liked to work with. I also got
the chance to supervise two exchange students, Vicky Ziemer and Aditi Joshi,
who did an amazing job during the three months they joined us for doing research.
It was a pleasure to work with you! Thank you, Vicky, for also supporting me in
finding a room in Boston and being a great friend when I visited the MIT Media
Lab. I am also very thankful for the great conversations with Johannes Knittel
and Dominique Rau who I already supervised as a teaching assistant during my
studies and later met in their role of startup founders.
I also want to thank some of my friends that I met during my time at the University
mainly already as a student. Nevertheless, I was always able to discuss questions
regarding my PhD thesis with them to get a different perspective and heavily
relied on their personal support. Thanks to Pascal Hirmer, Niklas Kaulitz,
Severin Leonhardt, Sebastian Richter, and Hansjörg Schmauder for great
lunch breaks, walks, and visits.
Last but not least, I want to thank my family for always supporting me in any pos-
sible way to complete this thesis: my parentsMarion Greis andMartin Greis,
my sister Madlen Greis and especially my husband Florian Greis. Without
your support, I would have not been the person I am today and I would have not
finished this thesis.
Thank you!
xii Acknowledgments
TABLE OF CONTENTS
List of Figures xix
List of Tables xxi
List of Acronyms xxiii
I INTRODUCTION AND MOTIVATION 1
1 Introduction 3
1.1 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.1 Literature Analysis . . . . . . . . . . . . . . . . . . . . 6
1.2.2 Requirement Analysis . . . . . . . . . . . . . . . . . . 6
1.2.3 Prototypes . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Research Context . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4.1 Requirements and Sources of Uncertainty . . . . . . . . 8
1.4.2 Explorations of the Design Space . . . . . . . . . . . . 8
1.4.3 Simulation Tool . . . . . . . . . . . . . . . . . . . . . . 9
1.5 Thesis Overview . . . . . . . . . . . . . . . . . . . . . . . . . 9
II FOUNDATIONS 15
2 Background 17
2.1 Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Types and Sources of Uncertainty . . . . . . . . . . . . . . . . 18
2.3 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
xiv TABLE OF CONTENTS
3 Related Work 21
3.1 Importance of and Challenges for Uncertainty Research . . . . . 22
3.2 Textual and Iconic Communication of Uncertainty . . . . . . . . 23
3.2.1 Linguistic Communication . . . . . . . . . . . . . . . . 24
3.2.2 Numerical Communication . . . . . . . . . . . . . . . . 24
3.2.3 Iconic Communication . . . . . . . . . . . . . . . . . . 25
3.2.4 Comparisons . . . . . . . . . . . . . . . . . . . . . . . 26
3.3 Uncertainty Visualization . . . . . . . . . . . . . . . . . . . . . 26
3.3.1 Visualization in General . . . . . . . . . . . . . . . . . 26
3.3.2 Geographic Information Science . . . . . . . . . . . . . 28
3.3.3 Weather Forecasting . . . . . . . . . . . . . . . . . . . 29
3.3.4 Human-Computer Interaction . . . . . . . . . . . . . . 30
3.4 Interpretation of Uncertain Data . . . . . . . . . . . . . . . . . 31
3.4.1 Problems of Understanding . . . . . . . . . . . . . . . . 31
3.4.2 Decision-Making under Uncertainty . . . . . . . . . . . 31
3.4.3 Confidence and Trust . . . . . . . . . . . . . . . . . . . 33
3.5 Simulation Tools for Non-Experts . . . . . . . . . . . . . . . . 34
3.6 Insights from Related Work . . . . . . . . . . . . . . . . . . . . 35
4 Understanding Simulation Users 37
4.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.2 Simulation Experts . . . . . . . . . . . . . . . . . . . . . . . . 38
4.2.1 Online Survey . . . . . . . . . . . . . . . . . . . . . . 38
4.2.2 Paper Questionnaire . . . . . . . . . . . . . . . . . . . 41
4.2.3 Implications from Expert Simulation Usage . . . . . . . 43
4.3 Non-Experts . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.3.1 Current Simulation Usage in Everyday Life . . . . . . . 45
4.3.2 Developing Definitions and Potential Use Cases . . . . . 49
4.3.3 Future Usage of Simulations in Everyday Life . . . . . . 52
4.3.4 Implications . . . . . . . . . . . . . . . . . . . . . . . . 56
4.4 Insights for Developing an End-User Simulation Tool . . . . . . 59
TABLE OF CONTENTS xv
III UNCERTAINTY IN INTERACTIVE SYSTEMS 61
5 Sources of Uncertainty 63
5.1 The General Interaction Framework . . . . . . . . . . . . . . . 64
5.2 Enhancing the General Interaction Framework . . . . . . . . . . 65
5.2.1 User . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.2.2 Articulation . . . . . . . . . . . . . . . . . . . . . . . . 66
5.2.3 Input . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.2.4 Performance . . . . . . . . . . . . . . . . . . . . . . . 67
5.2.5 System . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.2.6 Presentation . . . . . . . . . . . . . . . . . . . . . . . . 68
5.2.7 Output . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.2.8 Observation . . . . . . . . . . . . . . . . . . . . . . . . 68
5.3 Implications for HCI Research . . . . . . . . . . . . . . . . . . 69
6 Input Methods 73
6.1 Enhancing Input Controls . . . . . . . . . . . . . . . . . . . . . 74
6.1.1 Common Input Controls . . . . . . . . . . . . . . . . . 75
6.1.2 Methods for Entering Uncertainty . . . . . . . . . . . . 78
6.1.3 Design of Non-Functional Prototypes . . . . . . . . . . 80
6.1.4 Selection of Promising Designs . . . . . . . . . . . . . 82
6.1.5 Evaluation in the Lab . . . . . . . . . . . . . . . . . . . 82
6.1.6 Implications . . . . . . . . . . . . . . . . . . . . . . . . 87
6.2 Probability Distribution Sliders . . . . . . . . . . . . . . . . . . 88
6.2.1 Design Process . . . . . . . . . . . . . . . . . . . . . . 88
6.2.2 Online Evaluation . . . . . . . . . . . . . . . . . . . . 91
6.2.3 Evaluation in the Lab . . . . . . . . . . . . . . . . . . . 94
6.2.4 Implications . . . . . . . . . . . . . . . . . . . . . . . . 97
6.3 Tangible Shape-Changing Input Controls . . . . . . . . . . . . . 98
6.3.1 Prototype Designs and Qualitative Evaluation . . . . . . 99
6.3.2 Evaluation in the Lab . . . . . . . . . . . . . . . . . . . 101
6.3.3 Discussion & Implications . . . . . . . . . . . . . . . . 103
6.4 Physiological Sensing . . . . . . . . . . . . . . . . . . . . . . . 104
6.4.1 Question Selection Process . . . . . . . . . . . . . . . . 105
6.4.2 Evaluation in the Lab . . . . . . . . . . . . . . . . . . . 106
6.4.3 Discussion & Implications . . . . . . . . . . . . . . . . 109
6.5 Insights for Quantifying Uncertainty in User Input . . . . . . . . 110
xvi TABLE OF CONTENTS
7 Output Methods 113
7.1 Communication of Uncertainty in Current Mobile Applications . 114
7.1.1 Analysis of Mobile Applications . . . . . . . . . . . . . 115
7.1.2 Online Survey . . . . . . . . . . . . . . . . . . . . . . 117
7.1.3 Discussion & Implications . . . . . . . . . . . . . . . . 118
7.2 Uncertainty Visualization for Activity Tracking . . . . . . . . . 118
7.2.1 Online Survey . . . . . . . . . . . . . . . . . . . . . . 119
7.2.2 Evaluation in the Wild . . . . . . . . . . . . . . . . . . 124
7.2.3 Implications . . . . . . . . . . . . . . . . . . . . . . . . 128
7.3 Decision-making under Uncertainty . . . . . . . . . . . . . . . 128
7.3.1 Classification of Representations . . . . . . . . . . . . . 129
7.3.2 Online Evaluation . . . . . . . . . . . . . . . . . . . . 129
7.3.3 Evaluation in the Wild . . . . . . . . . . . . . . . . . . 135
7.3.4 Implications . . . . . . . . . . . . . . . . . . . . . . . . 140
7.4 Insights for Communicating Uncertainty . . . . . . . . . . . . . 140
8 Interpretation 143
8.1 Humans Predictions . . . . . . . . . . . . . . . . . . . . . . . . 144
8.1.1 Android Application . . . . . . . . . . . . . . . . . . . 145
8.1.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . 145
8.1.3 Participants . . . . . . . . . . . . . . . . . . . . . . . . 147
8.1.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . 147
8.1.5 Discussion & Implications . . . . . . . . . . . . . . . . 150
8.2 Aggregating Forecasts from Multiple Sources . . . . . . . . . . 151
8.2.1 Design Rationale for Aggregation Mechanisms . . . . . 151
8.2.2 Detailed Research Questions and Hypotheses . . . . . . 154
8.2.3 Online Evaluation . . . . . . . . . . . . . . . . . . . . 155
8.2.4 Evaluation in the Wild . . . . . . . . . . . . . . . . . . 162
8.2.5 Implications . . . . . . . . . . . . . . . . . . . . . . . . 165
8.3 Humans’ Internal Models for Aggregating Conflicting Uncertain
Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
8.3.1 Detailed Research Questions and Hypotheses . . . . . . 167
8.3.2 User Study . . . . . . . . . . . . . . . . . . . . . . . . 168
8.3.3 Discussion & Implications . . . . . . . . . . . . . . . . 172
8.4 Insights for Supporting the Interpretation . . . . . . . . . . . . . 175
TABLE OF CONTENTS xvii
IV CONCLUSION AND FUTURE WORK 177
9 Simulation Tool for End-Users 179
9.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 179
9.2 User-Centered Process . . . . . . . . . . . . . . . . . . . . . . 180
9.3 Functionality . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
9.3.1 User Accounts . . . . . . . . . . . . . . . . . . . . . . 181
9.3.2 Modules . . . . . . . . . . . . . . . . . . . . . . . . . . 183
9.3.3 Models . . . . . . . . . . . . . . . . . . . . . . . . . . 183
9.3.4 Input Methods . . . . . . . . . . . . . . . . . . . . . . 184
9.3.5 Output Methods . . . . . . . . . . . . . . . . . . . . . 184
9.3.6 Simulations . . . . . . . . . . . . . . . . . . . . . . . . 185
9.4 Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
10 Conclusion 187
10.1 Research Contributions . . . . . . . . . . . . . . . . . . . . . . 187
10.1.1 Current Simulation Usage . . . . . . . . . . . . . . . . 188
10.1.2 Sources of Uncertainty in Interactive Systems . . . . . . 188
10.1.3 Input Methods . . . . . . . . . . . . . . . . . . . . . . 189
10.1.4 Output Methods . . . . . . . . . . . . . . . . . . . . . 190
10.1.5 Interpretation . . . . . . . . . . . . . . . . . . . . . . . 191
10.2 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . 191
11 Future Work 193
11.1 Sources of Uncertainty . . . . . . . . . . . . . . . . . . . . . . 193
11.2 Input Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
11.3 Output Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 194
11.4 Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
11.5 End-User Simulation . . . . . . . . . . . . . . . . . . . . . . . 195
11.6 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . 196
xviii TABLE OF CONTENTS
V BIBLIOGRAPHY 197
Bibliography 199
VI APPENDICES 215
A Study Interface for Enhancing Standard Input Controls 217
B Screenshots of SimulaTE 221
LIST OF FIGURES
4.1 Depicted simulation workflow . . . . . . . . . . . . . . . . . . 42
4.2 Activities of our design workshops . . . . . . . . . . . . . . . . 53
4.3 Sketches depicting predictive features . . . . . . . . . . . . . . 55
4.4 Diagram of a product flow for a predictive simulation service . . 57
5.1 The General Interaction Framework by Dix [2009] . . . . . . . 64
5.2 Enhanced version of the General Interaction Framework . . . . 71
6.1 Non-functional prototypes for enhanced standard input controls . 81
6.2 Boxplot diagrams for the probability percentage input . . . . . . 85
6.3 Boxplot diagrams for the range input . . . . . . . . . . . . . . . 86
6.4 Probability Distribution Sliders . . . . . . . . . . . . . . . . . . 90
6.5 Evaluation results for the probability distribution sliders . . . . . 92
6.6 Concepts of the split slider and the spring slider . . . . . . . . . 99
6.7 Concepts of the dial-based designs . . . . . . . . . . . . . . . . 100
6.8 Non-functional prototype of the split slider . . . . . . . . . . . . 100
6.9 Functional prototype of the split slider . . . . . . . . . . . . . . 101
6.10 Influence of the explanation on prototype usage . . . . . . . . . 103
6.11 Diagrams for key logging and eye tracking metrics . . . . . . . 108
7.1 Felt uncertainty of step counts of activity trackers . . . . . . . . 121
7.2 Felt uncertainty of covered distance of activity trackers . . . . . 121
7.3 Felt uncertainty of calorie expenditure of tread treadmills . . . . 122
7.4 Felt uncertainty of covered distance of tread mills . . . . . . . . 122
7.5 Color palette and range display . . . . . . . . . . . . . . . . . . 124
7.6 Designs of graphical overviews for activity tracking data . . . . 125
7.7 Screens of the Pedometer application . . . . . . . . . . . . . . . 126
7.8 Representations with no or aggregated uncertainty . . . . . . . . 130
7.9 Representations with detailed aggregated or detailed uncertainty 131
7.10 Screenshots of the Facebook game Farm Smart . . . . . . . . . 136
7.11 Selection of representations for Farm Smart . . . . . . . . . . . 138
8.1 Screens of the Predict application . . . . . . . . . . . . . . . . . 146
8.2 Average relative error of predictions . . . . . . . . . . . . . . . 148
8.3 Evaluation results for the Predict application . . . . . . . . . . . 148
8.4 Aggregation mechanisms for uncertain data . . . . . . . . . . . 152
xx LIST OF FIGURES
8.5 Sketches of representations for weather forecats . . . . . . . . . 156
8.6 Preference and confidence by aggregation mechanism . . . . . . 158
8.7 Screens of the Weather Compare application . . . . . . . . . . . 163
8.8 Representations users in the experiment . . . . . . . . . . . . . 169
8.9 Different distances used in the experiment . . . . . . . . . . . . 169
8.10 Two-way interactions for the mean weight differences . . . . . . 173
8.11 Three-way interaction for the mean weight differences . . . . . 174
9.1 Architecture of SimulaTE . . . . . . . . . . . . . . . . . . . . . 180
9.2 Use case diagram of SimulaTE . . . . . . . . . . . . . . . . . . 182
A.1 Study interface for multiple-choice questions . . . . . . . . . . 218
A.2 Study interface for numerical questions . . . . . . . . . . . . . 219
B.1 Introduction page of SimulaTE . . . . . . . . . . . . . . . . . . 221
B.2 Module creator of SimulaTE . . . . . . . . . . . . . . . . . . . 222
B.3 Module overview of SimulaTE . . . . . . . . . . . . . . . . . . 223
B.4 Model creator of SimulaTE . . . . . . . . . . . . . . . . . . . . 223
B.5 Model overview of SimulaTE . . . . . . . . . . . . . . . . . . . 224
B.6 Model editor of SimulaTE . . . . . . . . . . . . . . . . . . . . 224
B.7 Input methods overview of SimulaTE . . . . . . . . . . . . . . 225
B.8 Output methods overview of SimulaTE . . . . . . . . . . . . . 225
B.9 Simulation run of SimulaTE . . . . . . . . . . . . . . . . . . . 226
B.10 Simulation results of SimulaTE . . . . . . . . . . . . . . . . . . 226
LIST OF TABLES
1.1 High-level research questions addressed in this thesis . . . . . . 5
4.1 Detailed research questions for expert simulation usage . . . . . 38
4.2 Detailed research questions for non-expert simulation usage . . 44
4.3 Diary entry from participant 17 . . . . . . . . . . . . . . . . . . 48
4.4 Diary entry from participant 24 . . . . . . . . . . . . . . . . . . 48
6.1 Overview of standard input controls . . . . . . . . . . . . . . . 76
6.2 Combinations of standard and uncertainty input methods . . . . 80
6.3 Preferences for uncertainty input methods . . . . . . . . . . . . 83
6.4 Degrees of freedom of a probability distribution function . . . . 89
6.5 Descriptive statistics for usage metrics . . . . . . . . . . . . . . 95
6.6 Categorization of developed input controls . . . . . . . . . . . . 110
7.1 Used representation methods in weather applications . . . . . . 115
7.2 Used representation methods in navigation applications . . . . . 116
7.3 Used representation methods in healthcare applications . . . . . 117
7.4 Descriptive statistics for statements on activity tracking . . . . . 120
7.5 Descriptive statistics for statements on measurement errors . . . 123
7.6 Classification of representations by displayed uncertainty . . . . 132
7.7 Descriptive statistics for statements on representations . . . . . 134
7.8 Correlations between statements and displayed uncertainty . . . 134
7.9 Descriptive statistics of in-game survey . . . . . . . . . . . . . 139
7.10 Overview of log data collected in Farm Smart . . . . . . . . . . 139
8.1 Statements presented in the online questionnaire about Predict . 146
8.2 Detailed research questions for the design for comparison . . . . 154
8.3 P values for contrasts of visualizations and scenarios . . . . . . 161
8.4 Detailed research questions for interpreting conflicting data . . . 167
8.5 Results of the linear mixed-effects models analysis . . . . . . . 171
xxii LIST OF TABLES
xxiii
xxiv LIST OF ACRONYMS
LIST OF ACRONYMS
AMS American Meteorological Society
ASL average sentence-length
ASW average syllables per word
CSUQ Computer System Usability Questionnaire
DoF degrees of freedom
FEM finite element method
FRE Flesch-Reading-Ease
FVM finite volume method
GIS geographic information systems
GIScience Geographic Information Science
GNSS global navigation satellite system
GPS Global Positioning System
HCI Human-Computer Interaction
HEPS Hydrological Ensemble Prediction Systems
IC input control
NIST National Institute of Standards and Technology
NRC National Research Council
xxvi LIST OF ACRONYMS
RQ research question
SUS System Usability Scale
UMUX Usability Metric for User Experience
I
INTRODUCTION AND
MOTIVATION

Chapter1
Introduction
Nowadays, interactive systems often rely on sensor data or complex mathematical
methods such as simulation and machine learning. This development poses new
challenges for the developers and designers of such systems. This is caused by the
fact that these mathematical methods are difficult to understand for non-experts
such as non-scientists. In recent years, research on user-centered machine learning
and simulations has started to emerge. However, these movements mostly focus
on how to make these methods easier to use for experts and not on how to make
it understandable for the general public.
One of the main challenges connected with simulations and machine learning
is that these methods produce uncertain data. Uncertainty quantification and
visualization is therefore a research field in many different natural sciences. The
main focus of this research is often the scientists themselves. Many different fields
work on different perspectives of the topic yet there is no consistent definition of
the term uncertainty, so varying terminologies and taxonomies concentrating on
different aspects of the topic are used. In Human-Computer Interaction (HCI),
however, the topic of uncertainty is fairly new and has only recently gained
the attention of the research community. Uncertainty is an inherent part of the
world and something users know from their everyday lives, for example by using
weather forecasts. The expectation about interactive systems and computers is,
however, that they work error-free. Communicating uncertainty is therefore not
trivial. Not communicating uncertainty could lead to users losing trust in a system
because they assume it does not work correctly. Communicating uncertainty may,
4 1 Introduction
however, lead to users not even starting to use a system as they believe that it
does not work properly.
Multiple studies in meteorology and psychology have shown that humans ap-
preciate getting information about uncertainty and make better decisions when
knowing about the uncertainty of presented data. These findings are, however,
yet to be transferred to interactive systems. Additionally, an interactive system
includes much more steps than looking at the data, for example the system has to
handle user input. To show uncertainty in the output, the uncertainty in the user
input has to be quantified. Current interactive systems mostly lack the opportu-
nity for users to specify that they are uncertain about their input. They provide
standard text or number fields for entering data. One example is an application
for tracking calorie intake, where users have to enter the weight of their food in
grams. Most users do not carry a scale and therefore guess the input values for
the standard number fields. This falsifies the output as the system treats the values
as reliable.
In addition to unavailable input options for users dealing with uncertain input,
there are also no guidelines on when to use which visualizations and how this
influences the usage behavior in the context of an interactive system. Additionally,
it has to be explored how users can be supported with the correct interpretation of
uncertain data. These are some of the core challenges that have to be tackled to
support designers and developers in quantifying and displaying uncertain data in
the best possible way for their target audience. Interactive systems dealing with
these challenges could support users in making better decisions under uncertainty.
The outlined core challenges therefore motivated our research questions presented
in the next section.
1.1 Research Questions
Humans do not only deal with uncertainty in everyday life, but also more and more
with uncertainty in interactive systems. Therefore, it is important to understand
how users deal with uncertainty and how presented uncertainty in interactive
systems affects users. New insights from research can support developers and
designers in effectively quantifying and displaying uncertainty.
This thesis presents an exploration of five high-level research questions (RQs)
to further increase the understanding of how developers, designers, and HCI
researchers can make uncertain data understandable to the general public (see
1.2 Methodology 5
Table 1.1: Summary of the five high-level research questions addressed in
this thesis. More detailed questions are outlined in the respective chapters.
Research Question (RQ) Chapter
RQ1 What can we learn from the current usage of simulations? Chapter 4
RQ2 What are the sources of uncertainty in interactive systems? Chapter 5
RQ3 What input controls are suitable for uncertain input? Chapter 6
RQ4 What visualizations are suitable for uncertain output? Chapter 7
RQ5 How do people interpret uncertain data? Chapter 8
Table 1.1). To answer these questions we built and evaluated prototypes with
the help of empirical methods from HCI. We embed our research results in the
application scenario of end-user simulations, which we define as simulations
used by the general public. A simulation tool including a set of input and output
methods is presented at the end of this thesis to foster future research.
Before exploring the topic of uncertainty in interactive systems and building
concrete prototypes, we decided to ask experts and laymen how they currently use
simulations to understand what we could learn from their usage behavior (RQ1).
The findings served as foundation for this thesis. Additionally, we needed to
identify the sources of all possible types of uncertainty in interactive systems to
paint the full picture of how to deal with uncertainty (RQ2). In the following
part of the thesis, we provide explorations for the suitability of different input
methods (RQ3), output methods (RQ4), and users’ interpretations of uncertain
data (RQ5).
The next section of this introduction describes the methodology used in this thesis
to address the outlined research questions.
1.2 Methodology
How to quantify and display uncertainty in interactive systems is a new and
mostly unexplored field of research. However, no new technologies are needed to
do research in this field. Due to the lack of design guidelines for how to quantify
and display uncertainty in interactive systems, we used a bottom-up approach.
We first started by observing current usage and identifying sources of uncertainty
in interactive systems. Based on our findings, we explored the influence and suit-
ability of different options that can be achieved with the current technology. Over
6 1 Introduction
three years we designed non-functional prototypes and implemented functional
prototypes in projects of different scales and evaluated them with potential users.
1.2.1 Literature Analysis
We reviewed related literature about uncertainty and end-user simulation tools.
We divided this literature into two parts extracting background information such
as definitions, and classifications. Based on our literature analysis, we additionally
enhanced the General Interaction Framework [Dix, 2009] with potential sources
of uncertainty in interactive systems.
1.2.2 Requirement Analysis
As a foundation for this thesis, we aimed to learn from the current usage of
simulations by both experts and non-experts. We conducted paper and online
questionnaires, as well as a diary study, focus groups, and design workshops. We
identified common usage patterns of simulation tools and key requirements for
building an end-user simulation tool.
1.2.3 Prototypes
To understand the suitability and the possibilities of quantifying, displaying, and
interpreting uncertain data in the context of interactive systems, we designed
a range of non-functional prototypes, such as digital sketches and 3-D printed
prototypes. Most of these prototypes were developed in a user-centered design
process. We used interviews, focus groups, and online surveys to iterate on our
design and identify potentially promising candidates. Based on the results, we
implemented web-based study environments, a Facebook game, and Android
applications.
1.2.4 Evaluation
We evaluated all our prototypes in lab studies or in-the-wild studies. Users either
had to use the prototype in a lab environment or could, for example, play a
1.3 Research Context 7
Facebook game or install an Android application to use it. We then asked them
for their opinions by using standardized questionnaires, questionnaires with Likert
scales, interviews, and qualitative feedback.
1.3 Research Context
The research that lead to this thesis was conducted at the University of Stuttgart
in the HCI group over a course of three years. It was additionally part of a project
funded in the Cluster of Excellence in Simulation Technology at the University
of Stuttgart. Collaborations with internal and external colleagues inspired us to
do this research.
Cluster of Excellence in Simulation Technology
The Cluster of Excellence in Simulation Technology at the University of Stuttgart
offers a unique environment of interdisciplinary project networks. This thesis
was conducted in the ongoing exchange of knowledge in the project network
called “Reflexion and Contextualization”. The mid-term presentation of this
thesis was accompanied by Prof. Dr. rer. pol. Dipl.-Ing. Meike Tilebein from the
Institute for Diversity Studies in Engineering. Additionally, a publication in the
context of this thesis was presented on a German conference on Economic and
Social Cybernetics in 2014 [Greis, 2014]. The researchers of the clusters also
participated in our focus groups, questionnaires, and studies as simulation experts
and gave us feedback on the developed simulation tool.
Human-Computer Interaction Group, University of Stuttgart
The HCI group at the University of Stuttgart includes researchers with a broad
range of knowledge. Collaborations with Passant El.Agroudy, Jakob Karolus,
Hyunyoung Kim, Alexandra Voit, Dr. Tonja Machulla, Dr. Paweł W. Woz´niak,
Dr. Céline Coutrix, Jun.-Prof. Dr. Niels Henze and Prof. Dr. Albrecht Schmidt
led to multiple publications [Greis et al., 2015, 2016, 2017a,d] and submissions
currently under review, which are all in the scope of this thesis. Of particular
success were two papers. The first, written in cooperation with Tonja Machulla
won an Honorable Mention Award at the CHI’17 conference [Greis et al., 2017a].
The second, written in cooperation with students and Niels Henze won the ACM
Best Student Paper Award at the EICS’17 conference [Greis et al., 2017d].
External Collaborations
In the context of the Cluster of Excellence in Simulation Technology, I was
enabled to join the Living Mobile Group at the MIT Media Lab under the su-
pervision of Chris Schmandt for three months. This collaboration led to a late
8 1 Introduction
breaking work publication at MobileHCI’17 [Greis et al., 2017b]. The networking
on conferences also lead to a workshop called “Designing for Uncertainty in
HCI", which was accepted and conducted at the CHI’17 conference [Greis et al.,
2017c]. Co-organizers of this workshop were Jessica Hullman (University of
Washington), Matthew Kay (University of Michigan), Michael Correll (University
of Washington), and Orit Shaer (Wellesley College).
1.4 Contributions
This thesis has three main contributions to the research about uncertainty in HCI:
1. We identify and present user requirements and sources of uncertainty in
interactive systems.
2. We explore possible input methods, output methods, and interpretation
strategies in the context of interactive systems.
3. We provide a simulation tool for future exploration of methods to enter and
display uncertainty.
In the following, we present more details for each of the main contributions.
1.4.1 Requirements and Sources of Uncertainty
Based on a literature review and observations of users, we identified important
requirements and sources of uncertainty. The related implications are the founda-
tion of this thesis and the foundation of future work in the field. The enhanced
General Interaction Framework includes possible sources of uncertainty that have
to be considered and quantified when developing interactive systems. We identi-
fied three main areas of interest for the HCI research community: the input, the
output, and the interpretation of uncertainty in the context of interactive systems.
1.4.2 Explorations of the Design Space
With the help of a series of non-functional and function prototypes, we explored
the key aspects identified based on the sources of uncertainty: the input, the
1.5 Thesis Overview 9
output, and the interpretation of uncertainty in the context of interactive systems.
We contribute novel methods for the explicit and implicit input of uncertainty.
We explored how to design standard input controls, specialized slider controls,
and tangible input controls for entering uncertain data. For each exploration, we
contribute a most promising design. We additionally showed the feasibility of
using behavioral and physiological measurements to implicitly capture uncertainty.
On the output side, we identified a lack of communication of uncertainty in
current mobile applications although our participants voiced a preference for
it. We therefore contribute concrete designs that improve the communication of
uncertainty for activity tracking data. We further contribute a classification of
representations based on the amount of uncertainty information included in a
representation and findings on how different amounts of uncertain information
impact decision-making. Regarding the interpretation, our research indicates
that user-made predictions can improve users’ reasoning about uncertainty and
predictions. We also contribute concrete design recommendations for showing
conflicting information to either increase users’ confidence or improve their
internal models of reasoning about conflicting data.
1.4.3 Simulation Tool
We further contribute a simulation tool that includes the novel input and output
methods developed and compared in our studies. The tool serves as a platform
for future research in the area of uncertainty in HCI to gather insights into how
users might use new input controls and visualizations.
1.5 Thesis Overview
The body of this thesis consists of four parts, which contain eleven chapters. After
this part Introduction and Motivation, the part Foundation presents an in-depth
literature review of related work about uncertainty in different research areas.
Literature only relevant for one chapter is discussed in the introduction of the
respective chapter. Additionally, it contains research on current simulation usage
and key requirements for an end-user simulation tool. This is followed by the
main part of this thesis; Uncertainty in Interactive Systems. This part contains the
identification of sources of uncertainty in interactive systems and explorations of
input methods, output methods, and the interpretation of uncertainty. The thesis
closes with the Conclusion and Future Work, which introduces the simulation
10 1 Introduction
tool, summarizes and discusses the research contributions of the whole thesis,
and presents future work.
Part II: Foundations
This part creates the foundations for the thesis. It contains background informa-
tion, an in-depth literature review and smaller research probes for identifying key
requirements for an end-user simulation tool.
Chapter 2: Background
In this chapter, we introduce and define the terms uncertainty and simulation. We
additionally provide an overview of classifications and taxonomies for uncertainty
from different research areas including information visualization, medical visual-
ization, and Geographic Information Science (GIScience). The chapter provides
the background knowledge for the topics discussed in this thesis.
Chapter 3: Related Work
Chapter 3 introduces related work relevant for all chapters of this thesis. In
the first section, we highlight the importance and the challenges of uncertainty
research including perspectives from different research areas. The following
section introduces work on textual and iconic communication of uncertainty
and discusses advantages and disadvantages of linguistic and numerical com-
munication of uncertainty. We further introduce related work that compares
numerical, verbal, and iconic representations for uncertainty. The third section
briefly introduces uncertainty visualization in the visualization community, then
focuses on the visualization of uncertainty for the general public. We mainly
focus on four subtopics which are visualizations in general, visualizations for
weather forecasting, visualizations for geographic data, and the specific use of
uncertainty visualization in HCI. We introduce different visualization possibilities
and discuss their advantages and disadvantages. In the fourth section, we discuss
the interpretation of uncertainty data focusing on the problems of judgements
under uncertainty, decision-making and concepts such as confidence and trust.
In the next section, we discuss the topic of end-user simulations and present
some related work on simulation tools, and in the last section we summarize the
implications of the presented related work for this thesis.
1.5 Thesis Overview 11
Chapter 4: Understanding Simulation Users
In Chapter 4, we present smaller research probes on the current usage of simu-
lations by simulation experts and non-experts. In the first section, we provide
definitions for these terms to distinguish between user groups, then present the
method and results of an online survey and a paper questionnaire conducted with
simulation experts. Furthermore, we present the methods and results of a diary
study, focus groups, and design workshops conducted with non-expert, in which
we explore the current usage of simulations in everyday life, potential use cases,
and the potential future usage of simulations and summarize our insights for the
development of an end-user simulation tool.
Part III: Uncertainty in Interactive Systems
Part III contains the main content of this thesis. It contains four chapters that
focus on one high-level research question each. The identification of sources of
uncertainty in Chapter 5 is based on related work. All other chapters contain
different explorations of uncertainty in interactive systems. All explorations
contain an evaluation.
Chapter 5: Sources of Uncertainty
In this chapter, we first introduce the General Interaction Framework by Dix
[2009], then enhance the General Interaction Framework to contain sources of
uncertainty in interactive systems. We identify these sources based on related
work, and provide an overview on the implications for HCI research.
Chapter 6: Input Methods
Chapter 6 contains four explorations of input methods for entering uncertainty.
In the first section, we explore how to enhance standard input controls to allow
users to enter additional uncertainty. Based on a taxonomy for common input
controls and methods for entering uncertainty, we built a set of sketches that were
refined in a pre-study. We then conducted an evaluation in the lab to identify the
most promising user interface. In the second section, we present our designs for
probability distribution sliders. These are specialized slider controls that offer
more transparency and flexibility than standard input controls. We evaluated the
sliders in an online survey and a lab study and provide implications for their use.
The third section introduces designs for tangible shape-changing input controls.
12 1 Introduction
We conducted focus groups to identify the most promising design Split Slider
and evaluated this design in a lab study. We further discuss why shape-changing
input controls are a promising area for the input of uncertainty. The next section
introduces physiological sensing as an implicit method for measuring uncertainty.
We present our methods and evaluation in the lab. The last section of the chapter
summarizes our insights into quantifying uncertainty in user input.
Chapter 7: Output Methods
To understand the state of output methods for uncertainty communication, we
analyzed current mobile applications on how they communicate uncertain data.
The first section of this chapter describes our analysis and contains the results
of an online survey on users’ expectations for uncertainty communication. We
further designed three different graphical overview communicating uncertainty
for activity tracking data that we introduce in the next section. We conducted an
online survey to inspire our designs and then evaluated the designs in an in-the-
wild study. In the third section, we introduce a classification of representations
according to the amount of uncertainty information. We additionally present the
results of an online survey and an in-the-wild evaluation of different visualizations
for uncertainty to understand their influence on decision making. We complete
the chapter with a section describing our insights on communicating uncertainty
in interactive systems.
Chapter 8: Interpretation
In this chapter, we present three research probes on the interpretation of uncertain
data. The first section presents an android application and an evaluation of the ap-
plication which leveraged user-made predictions to teach users behavior patterns.
In the second section, we present the method and results of an online survey and
an in-the-wild study which compared different aggregation mechanisms for data
from multiple weather source providers. In the following section, we present the
methods and results for an experiment in the lab where participants had to indicate
the true value for two conflicting sensor measurements. These measurements
were depicted with different visualizations to learn about humans’ internal models
of aggregating conflicting data when using uncertainty visualization. We outline
our insights of the chapter in the last section.
1.5 Thesis Overview 13
Part IV: Conclusion and Future Work
This part consists of three chapters that summarize and conclude the work pre-
sented in the previous chapters. Based on our research, we implemented a simu-
lation tool for end-users. We additionally present the conclusion and potential
directions for future research.
Chapter 9: Simulation Tool for End-Users
Based on the research presented in Part III, we present SimulaTE - a simulation
tool for end-users. The tool includes input methods and output methods for
uncertain data and was built based on the key requirements identified in Chapter 4.
The tool should serve as a basis for future research on uncertainty in interactive
systems.
Chapter 10: Conclusion
In this chapter, we summarize and present the main contributions and conclusions
from this thesis. The discussions focuses on the research questions identified in
the Introduction chapter of this thesis.
Chapter 11: Future Work
Chapter 11 focuses on future work. We identify and discuss potential directions
for future research and follow-up projects.
14 1 Introduction
II
FOUNDATIONS

Chapter2
Background
In this chapter, we present background information, definitions, and classifications
relevant for this thesis. We focus on definitions and classifications of uncertainty,
and simulation as a specific application area where uncertainty plays an important
role.
2.1 Uncertainty
Uncertainty quantification and visualization is an object of research in many dif-
ferent domains. As most research about uncertainty is domain-specific, manifold
definitions and different uses of the term uncertainty exist. Additionally, the
distinction between uncertainty and related terms such as data quality, reliability,
accuracy, and error mostly remains unclear [MacEachren et al., 2005].
Pang et al. [1997] define uncertainty “to include statistical variation or spread,
error and differences, minimum-maximum range values, noisy, or missing data.”,
covering all possible types and sources of uncertainty. This definition is widely
used in the visualization community. Other definitions are similarly open such
as that by Bonneau et al. [2014], who refer to uncertainty as “the lack of infor-
mation”. Similar to Potter et al. [2012], they differentiate between epistemic
and aleatoric uncertainty, in which epistemic uncertainty could, for example, be
introduced by wrong measurements or models, while aleatoric uncertainty is
18 2 Background
inherent random uncertainty that can be calculated statistically. The National
Institute of Standards and Technology (NIST) guidelines also point to two types
of measurement uncertainty [Taylor and Kuyatt, 1994]. Besides these, other
very strict definitions exist. Gershon [1998] for example sees uncertainty as one
singular factor of imperfect knowledge where the information is actually known
but the user is unsure about its accuracy.
One example of how researchers disagree on the definition of uncertainty also
manifests in the area of model-based decision support. While Walker et al.
[2003] define uncertainty “as being any deviation from the unachievable ideal of
completely deterministic knowledge of the relevant system”, Norton et al. [2006]
highly disagree with this definition as they consider it to be problematic from a
social science perspective.
In this thesis, we follow the definition of Pang et al. [1997] which includes all
possible types and sources of uncertainty. The main reason for following Pang’s
definition is that we are not interested in one specific type of uncertainty, but
rather in a holistic view on uncertainty in interactive systems. We therefore want
to incorporate as many understandings of uncertainty as possible.
2.2 Types and Sources of Uncertainty
In addition to multiple definitions of the term uncertainty, several taxonomies and
typologies in different domains and with different foci exist. They either refer to
the sources of uncertainty, classification of visualization methods, dimensions, or
types of uncertainty.
Skeels et al. [2008] identified types of uncertainty commonly discussed in lit-
erature. They found three levels of uncertainty: measurement precision, com-
pleteness, and inference. They additionally found two categories that span levels:
disagreement and credibility. Disagreement refers to data either measured mul-
tiple times or provided by disagreeing sources. Credibility, however, refers to
a data source producing unreliable data. In this thesis, we do not focus on any
specific type of uncertainty as all of the types identified by Skeels et al. might be
present in interactive systems and therefore relevant for the HCI community.
In their work on advanced navigation systems, Andre and Cutler [1998] identified
three dimensions of uncertainty: accuracy (e.g. biases in measurements), preci-
sion (e.g. data measured with a low level of precision), and time (e.g. delivering
information with a time lag). They developed these dimensions in the application
2.2 Types and Sources of Uncertainty 19
scenario of information displays for pilots. Their dimensions mostly correlate to
some of the types of uncertainty identified by Skeels et al. [2008]; however, they
are transferred to a specific application scenario and contain the time lag as a very
specific dimension of uncertainty in information displays.
Focusing on uncertainty visualization, Pang et al. [1997] reviewed sources of
uncertainty in the context of the visualization pipeline. They further classified
uncertainty visualization methods based on their value, location, data, extent,
visualization extent, and axis mapping. Potter et al. [2012] refined existing classi-
fications and focused on only two qualities: data dimension and data uncertainty
dimension, providing examples for all combinations. Although these classifica-
tions provide a good overview of uncertainty visualization they are rather complex
and apply to visualizations used by visualization experts. They might not be as
suitable for the HCI community.
Ristovski et al. [2014] present a taxonomy of uncertainty types in medical visu-
alization based on spatial location, dimensionality, type of events, and sources.
They identified sources of uncertainty such as noise, discretization, interpolation,
or human interpretation. Of these, human interpretation as a source of uncertainty
might be of particular relevance for HCI research.
Walker et al. [2003] describe their conceptual framework for understanding
uncertainty in decision-making, proposing to distinguish between three dimension
of uncertainty: location (where the uncertainty manifests in the model), level
(from known to completely unknown), and the nature of uncertainty (type of
uncertainty). The locations of uncertainty could also be described as sources of
uncertainty. However, their work only focuses on the model, not on the role of
the user in the decision-making process.
Further specialized typologies exist in other domains. One example is the typology
developed by Thomson et al. [2005] for geospatial data and intelligence analysis.
Some categories of their typology are related to the types of uncertainty discussed
by Skeels et al. [2008]; however, their typology is much more detailed and
contains a timing category which may refer to the time lag described by Andre
and Cutler [1998]. One drawback of the typology is that it does not relate the
identified categories to sources of uncertainty.
Although multiple taxonomies and typologies for uncertainty exist, there is no
agreement on which of these taxonomies or typologies should be used in HCI.
As none of them refers to a system with user input, we argue that a theoretical
examination of sources of uncertainty in HCI is needed, which we provide in
Chapter 5.
20 2 Background
2.3 Simulation
Simulation is a very powerful technique used in different fields to explore the
behavior of complex systems such as, for example, the flow of ground water,
human walking behavior, and the world’s climate. Due to its applicability to
many problems, simulation is one of the most used techniques in research and
management sciences [Law, 2015]. In this thesis, we use a definition of simula-
tion following Shannon [1998]: Simulation is the development of a mathematical
model imitating real systems to help to understand the system or develop man-
agement strategies for the system, because the adaption of the real system is, e.g.,
impossible or too expensive.
What has mainly hindered simulations so far from gaining more importance
in everyday life, is that their execution takes a lot of computing time [Law,
2015]. However, computing power is steadily increasing; thus, the computation
of simulations will increasingly become feasible on mobile phones and similar
small personal devices.
Chapter3
Related Work
Related work in many different research areas, although mainly in HCI and
visualization is of interest for this thesis. In this chapter, we present related
work about the importance of the research topic uncertainty itself, uncertainty
visualization, uncertainty communication in HCI and beyond, and related work
for one application scenario of uncertainty communication: end-user simulations.
Uncertainty visualization and communication to experts and the general public is
relevant for many application scenarios such as ensemble weather forecasts [Gneit-
ing and Raftery, 2005], context-aware systems [Antifakos et al., 2004; Lim and
Dey, 2009], navigation systems and maps [Andre and Cutler, 1998; MacEachren,
1992], object classification [Bisantz et al., 2011], machine learning [Kay et al.,
2015], data analysis [Ferreira et al., 2014], and information fusion [Riveiro, 2007].
Application scenarios can already be very concrete such as interactive tools to
explore personal genomics [Shaer et al., 2016], public transport predictions [Kay
et al., 2016], or the potential range of an electric car [Jung et al., 2015]. We
argue that this makes uncertainty an interesting and important research topic as
no general understanding or guidelines about uncertainty exist so far.
22 3 Related Work
3.1 Importance of and Challenges for Uncer-
tainty Research
As Couclelis [2003] stresses in her work, uncertainty is included in complex
knowledge no matter what scientists do about it. She argues that scientists
often present information as fact, when in reality it is affected by uncertainty.
Uncertainty communication for the public is therefore very important as nowadays
many people can provide information on the internet, where it is unclear on
how certain or uncertain the sources are. Although Couclelis mainly refers to
geographic information systems (GIS), we argue that this nowadays holds true for
other systems as well. With the huge amount of information freely available on
the internet, it is more difficult for the public to understand whether data sources
are reliable, or whether presented information is uncertain.
We argue that uncertainty communication practice follows the same phases of risk
communication described by Leiss [1996]. According to Leiss, in phase 1 uncer-
tainty has to become manageable and quantifiable. In phase 2, the communication
needs to gain attention and in phase 3, the responsibility of both dimensions has
to become a part of the normal practice. Five years later, Lipkus et al. [2001] still
identified the communication of uncertainty as one important issue in graphical
risk communication that needs to be addressed. Even today, the communication
and visualization of uncertainty is still a difficult problem with no clear solution
at hand.
Brodlie et al. [2012] presented several reasons why it is so hard to deal with
uncertainty in visualizations. First of all, uncertainty itself is very complex and
can be presented with various different visualizations, for example as bounded
data or probability distribution function. Uncertainty can be introduced at the
modeling step and then adds up (e.g. through linearity in the computer hardware)
and propagates through the visualization pipeline. In the visualization itself,
uncertainty is often added as an additional dimension as it has to be visualized
in addition to the deterministic values and often dominates certainty when for
example the error bars are very long. In general, uncertainty also makes topics
interdisciplinary as a lot of different groups of people have to work together to
correctly quantify, propagate, and visualize uncertainty. This clarifies that the
nature of the topic uncertainty itself turns it into a complex research area with
many pitfalls. As HCI is an interdisciplinary field, uncertainty research should be
included to understand how interactive systems are affected by uncertainty.
3.2 Textual and Iconic Communication of Uncertainty 23
MacEachren et al. [2005] identified future research challenges for the visualization
of uncertainty in geospatial data. Some of these are highly interesting for HCI,
such as the identification of the components of uncertainty and how they relate
to users’ needs as well as understanding the usability of visualizations and tools
that help capture, represent, or interact with uncertain data. The American
Meteorological Society (AMS) [2008] outlined very similar challenges with
probability information in weather forecasts. The challenges here also focused
on the tools for communicating uncertainty information, mainly on how users
can easily understand uncertainty information and how they can be supported
in actually interpreting such information. As these challenges span different
domains and reach far into HCI, we therefore see uncertainty communication in
different domains as promising application areas for the field.
Boukhelifa and Duke [2009] summarized the challenges in three points that are
crucial for the visualization of uncertain data: First, the quality and the scope of
the uncertain data has to be good, which automatically includes the knowledge
of sources of uncertainty and the correct quantification of uncertainty contained
in the data. Second, the limited confidence in the data is a major problem. And
third, the visualization itself can confuse people to make wrong assumptions or
interpretations of the data. HCI is a discipline that can help tackle and probably
solve some of the recent problems around uncertainty communication both to
experts and the general public.
3.2 Textual and Iconic Communication of
Uncertainty
Uncertainty communication can have various forms and consequences. Systems
communicating uncertainty might fail to convince users that they do actually
work. However, in automated systems not communicating uncertainty, the data
might be perceived as more reliable or accurate by users than it really is. When
this has impact on decisions, users may lose trust in an automated system not
communicating uncertainty [Andre and Cutler, 1998]. This indicates that inter-
active systems need to find the optimal compromise in presenting uncertainty
information without overwhelming or confusing users.
For the textual communication of uncertain data, either numerical or linguistic
expressions or a combination of both can be used. In the following, we present
research using both methods and additionally icons to communicate uncertainty.
24 3 Related Work
3.2.1 Linguistic Communication
Early researchers into this question assigned probabilities to linguistic expressions
such as “almost certain” or “probable” [Kent, 1964], but humans have different
perceptions of terms such as “low risk” and “low uncertainty” [Wallsten et al.,
1986]. Budescu et al. [2009] conducted a study where participants assigned
numerical values to linguistic expressions. The judgements varied considerably
across subjects, and the authors recommended specific guidelines to make linguis-
tic communication of uncertainty easier: First, there needs to be a differentiation
between the uncertainty of the specific event and general ambiguity in the de-
scription (e.g., a big storm). Second, specifying the sources of the uncertainty
stating their nature and magnitude could help people to better understand the
information. Third, linguistic and numerical communication together seems to
be most promising and fourth, ranges of uncertainty classes should not be strict
but adapt depending on the event. These guidelines could also be applied to
interactive systems using linguistic expressions to communicate uncertainty.
Another factor that plays a role for the linguistic communication of uncertainty is
framing. Spiegelhalter et al. [2011] present examples of the influence of positive
and negative framing on the understanding of uncertainty information.
3.2.2 Numerical Communication
As linguistic communication, communicating risk in numerical ways also has
disadvantages. As Lipkus et al. [2001] found in their conducted studies, even one-
fifth of their highly educated participants were not able to answer easy numerical
questions such as “Which represents the lagrer risk: 1%, 5%, or 10%?” This
may also have impact on uncertainty communication as uncertainty is often
communicated as a probability or percentage. It is important to better understand
how people interpret such values. The authors propose training as one option for
improving humans’ understanding of numerical expressions and probabilities.
One specific problem with weather forecasts is the communication of the proba-
bility of precipitation which is often interpreted wrongly [Gigerenzer et al., 2005].
The main problem is that the reference class for the probability is missing, thus
forecast users do not know what the probability refers to. For a probability of
precipitation of 30%, they will for example often assume that it will rain 30% of
the time or in 30% of the area instead of understanding that it will rain in 3 out
of 10 days that are like the forecasted day.
3.2 Textual and Iconic Communication of Uncertainty 25
Gigerenzer and Hoffrage [1995] conducted two studies comparing frequency
and probability formats for communicating uncertainty in well-known problems
such as the mammography problem. They found that the frequency formats
improved participants’ Bayesian reasoning. They also proposed to better teach
students with different statistics education about how to convert probability into
frequency formats to increase the understanding. However, this finding seems
to be dependent on the task and problem presented to participants. Frequency
formats seem to be less optimal to communicate uncertainty in weather forecasts
and rather confused participants [Joslyn and Nichols, 2009]. Weather forecast
users actually seem to prefer numerical probabilities in weather forecasts and are
able to understand them [Murphy et al., 1980]. Whether to use frequency formats
or numerical probabilities should therefore carefully be evaluated based on the
application area and the presented information.
One important aspect of uncertainty communication includes the formulation of
uncertainty. Joslyn et al. [2009] found that mismatches between a given piece of
information and a given task could lead to confusion, e.g., issuing a wind warning
if the wind exceeds a certain limit, but the information provided is the probability
that the wind is less than the specific limit. They conclude that the provided
uncertainty information has to be in line with the goal and the task.
Teigen and Jørgensen [2005] combine two ways of expressing uncertainty: prob-
abilistic modifier (such as 90% probable) and interval estimates (e.g., 3 to 4
weeks) to have intervals with probability estimates. They ran multiple experi-
ments to study how credible such confidence intervals were. They showed that
participants estimated the confidence very differently than the actual confidence
of the intervals and that the estimated confidence was dependent on the interval
size.
3.2.3 Iconic Communication
Bisantz et al. [2011] additionally explored the use of transparent icons in situations
of decision-making related to object classification. In one study they compared
three different visualizations: a solid icon of the most probable classification, a
transparent icon of the most probable classification where transparency matched
probability, and a transparent icon of a missile that was the object for participants
to react on. They found that a toggle mode allowing participants to toggle
between the different representations worked best. For interactive systems, it
might actually be possible to support multiple representations and provide a
toggle functionality for users.
26 3 Related Work
3.2.4 Comparisons
Bisantz et al. [2005] compared numerical, verbal, and iconic representations for
uncertainty information. All of these worked well to communicate uncertainty
information. More fine-grained steps in the representations and icons helped
participants to make more conformed decisions and take fewer risks. However,
this might also depend on the given task and information.
Lipkus [2007] identified research directions for the verbal, numerical, and graph-
ical communication of risk. For graphical displays, more work needs to focus
on understanding the impact of these visualizations on risk perception and the
actual interpretation of the information by users. One interesting aspect is that the
author proposes to engage the users interactively to explore the presented data
which might sharpen their mental representation of the data and the associated
risk. This indicates that interactive systems might be a good tool to educate users
about uncertainty.
3.3 Uncertainty Visualization
Uncertainty visualization is a well explored topic in the visualization community.
Research focuses on different visualizations, user groups, and data dimensions.
Additionally, many other research fields evaluate uncertainty visualizations for
specific application scenarios. In this thesis, we mainly focus on work relevant
for communicating uncertainty to the general public. However techniques such as
shading [Jackson, 2008], animation [Ehlschlaeger et al., 1997], simulations with
pixel mixing [Hengl and Toomanian, 2006], summary plots [Potter et al., 2010],
or glyphs [Wittenbrink et al., 1996] and other complex methods for uncertainty
visualization have been explored and used in the visualization community.
In the following, we present work from different research fields such as visualiza-
tion, GIScience, weather forecasting, and HCI.
3.3.1 Visualization in General
Olston and Mackinlay [2002] propose to use different representations for two
different types of uncertainty: statistical and bounded uncertainty. For statistical
uncertainty, they propose to use error bars, whilst for bounded uncertainty, they
3.3 Uncertainty Visualization 27
propose to use ambiguation. However, as they did not evaluate the visualizations
with users, it is difficult to assess whether non-experts in statistics would be able
to understand the differences between them. Additionally, ambiguation might
often be used for confidence intervals and then lead to confusion about whether
statistical or bounded uncertainty is being visualized.
Tak and Toet [2014] conducted a study with non-experts comparing seven vari-
ations of a line chart. The uncertainty information was included by adding
additional components such as solid borders, a gradient or a confidence interval to
the line chart. The results showed that the representation influenced how certainty
was perceived by the participants. However, all representations communicated
uncertainty as participants interpreted the borders with less certainty than the mid-
dle of the representation. Most suitable according to their findings were dashed
borders, random lines, and gradients as participants’ answers on the certainty
fitted a normal distribution quite well. This work provides interesting insights
into how suitable these techniques are to match a given certainty; however, it is
unclear how this affects users’ decision-making and whether they can use the
representations for reasoning.
Error bars are a common tool to communicate uncertainty for different chart
types, mainly used in the scientific or expert context. Sanyal et al. [2009] ran
a user study to compare four uncertainty visualizations for 1D and 2D datasets:
error bars, scaled glyphs, colored glyphs, and color-mapping on the data surface.
Participants had to for example search or count the most uncertain or certain data
points. The study did not lead to clear results, but error bars performed worst.
The authors speculate that the effectiveness of the visualizations highly depends
on the particular task. Correll and Gleicher [2014] also found that bar charts
with error bars may lead to reasoning biases such as the “within-the-bar bias”. A
value in the bar is seen as more likely than a value outside the bar although they
may have the same probability. Visually symmetric and continuous plots such
as gradient or violin plots seem more promising in avoiding such biases. Error
bars are additionally misleading as they can either stand for the standard error,
the standard deviation, or a confidence interval [Belia et al., 2005]. Belia et al.
also found that researchers do have problems in relating error bars to statistical
significance and do not understand the impact of a with-in- or in-between-group
design. We can conclude that although error bars are widely used, they are
not the optimal tool for communicating uncertainty even in a scientific context.
Uncertainty visualization techniques that are already adopted by a community
do not necessarily lead to a better understanding of presented data. New ways
of communicating uncertainty are still needed, because they may outperform
established methods.
28 3 Related Work
Gschwandtnei et al. [2016] compared six different representations for communi-
cating the uncertainty of temporal data: three representations showing statistical
uncertainty and three showing bounded uncertainty. However, in contrast to Ol-
ston and Mackinlay [2002], they define error bars as bounded uncertainty instead
of statistical uncertainty. Participants in the study had to specify start and end
times, interval durations, probabilities, and personal preferences in relation to
the representations. The results show that ambiguation or error bars work best
for start and end times as well as minimum and maximum times; however, for
the probability aspect a gradient worked best although participants did not prefer
the gradient representation. This shows that representations encoding bounded
uncertainty are suitable for tasks that do not require to use probability values. If,
however, the probability of the points in time plays an important role, then more
complex representations have to be used.
3.3.2 Geographic Information Science
MacEachren [1992] proposed different examples to show uncertainty on maps,
for example, by reducing the clarity, adding fog, or reducing the resolution of
a map part. The information could either be shown on a second separate map,
sequentially presented (e.g. by toggling the two maps), or by incorporating all the
information in a bivariate map. The author stresses the importance of evaluating
such alternatives to identify the most promising candidates. Empirical research
methods from HCI could be used to evaluate interactive interfaces showing maps
with uncertain data.
Aerts et al. [2003] conducted a web-based survey with maps that showed uncer-
tainty by varying the color intensity of the map. They provided two different
versions: one version in which participants saw the model result next to a map
showing the uncertainty and a second version in which participants were able to
toggle between the model result and the uncertainty information. They found that
all participants - experts and novices - did not find the display of uncertainty more
complex, but instead found that embedding uncertainty rather clarifies. This is,
however, only a preference of people to like uncertainty displays, but does not
yet allow any conclusions on how uncertainty information on maps influences
decision making.
MacEachren et al. [2012] also explored how to display uncertainty for single
objects (e.g. locations on a map). They evaluated a broad range of visual variables
such as fuzziness, location, and saturation. Participants preferred fuzziness and
location, while saturation was ranked very low. They also tested iconic signs,
3.3 Uncertainty Visualization 29
which can sometimes be more intuitive than visual variables if the metaphor of
the icon is easy to grasp. The work provides a good overview of visual variables
and could be used to explore these in different application areas.
3.3.3 Weather Forecasting
The National Research Council (NRC) already raised the need for uncertainty
communication in weather forecasting in 2006: “Uncertainty is thus a funda-
mental characteristic of weather, seasonal climate, and hydrological prediction,
and no forecast is complete without a description of its uncertainty.” [National
Research Council, 2006]. Therefore much work on communicating uncertainty
to the public has been conducted in the field of weather forecasts.
Joslyn and Savelli [2010] ran a survey that showed that weather forecast users
are aware of the uncertainty and factors (such as the time) that increase the
uncertainty in a forecast. However, they judge the uncertainty very differently as
they seem to have high variations and unjustified biases. The authors suspect that
humans may have a bias towards a climatological norm, which results in them
perceiving extremely high or low values as less probable than what they consider
to be normal values. Explicit uncertainty visualization could help to better convey
the information and reduce biases.
Ibrekk and Morgan [1987] looked at a greater variety of representations for
communicating uncertainty to non-experts in the weather context. They compared
representations such as confidence intervals, box plots, pie charts, gradients, and
probability density functions. In the presented study, participants had to solve
concrete tasks such as estimating the mean of snowfall, the probability that a
specific amount of snowfall had been exceeded, or the probability that the snowfall
lies within a certain range. Based on their results, the authors propose to use a
probability density function in combination with a cumulative probability density
function.
Frick and Hegg [2011] conducted a longitudinal study with a visualization plat-
form for meteorological and hydrological information. The platform did not
provide an interpreted weather forecast, but instead the outputs of multiple en-
semble models. They found that users of the platform preferred to have detailed
information about uncertainty to make their own judgements, but that new users
might need some time to adapt and get comfortable with the system. Nevertheless,
the display of the ensemble models increased users’ confidence and may also
30 3 Related Work
support decision-making. One challenge that the authors however identified, is
that users might see uncertainty as a shortcoming rather than a feature.
In contrast to understanding the interpretation of weather forecasts by the general
public, other studies worked with meteorologists to understand their needs and
opinions on weather forecasts. Morrow et al. [2008] ran focus groups with
broadcast meteorologists to understand their view. They raised concerns on how
uncertainty communication can fit into the limited time span of broadcasts and
feared that this might impact on the competition with other broadcasters. The
broadcast meteorologists rather were convinced that the general public wanted
definitive answers and that uncertainty could most probably be communicated
through the delivery of the forecast without using numerical expressions. In
contrast to their view, forecasters are convinced that forecasts should be displayed
as probabilities [Murphy and Winkler, 1971]. Here, it is important to note that
different groups and stakeholders might disagree on whether uncertainty should
be presented in a specific application scenario.
Pappenberger et al. [2013] ran a study with expert users of Hydrological Ensemble
Prediction Systems (HEPS). Their participants stated that uncertainty is often
not released to the general public as it is unclear how to best communicate the
uncertainty to make it understandable for non-experts. However, they agreed that
for experts, it is very important to have the information about the uncertainty.
3.3.4 Human-Computer Interaction
Kay et al. [2016] explored how to present public transport predictions on mobile
devices. They compared density plots, stripe plots and dotplots, which are a novel
discrete representation that can be used instead of continuous density plots. In a
user study, they found that dotplots improved probability estimates and confidence
of participants. The dotplot seems to be a promising alternative to probability
density plots for communicating uncertainty to non-experts.
A special application area for uncertainty communication is the exploration of
personal genomics data. The area is special as the uncertainty does not come
from the data itself, but from its interpretation and the implications drawn from
new research in the area [Shaer et al., 2017]. Shaer et al. developed GenomiX,
which allows users to interact with their personal genomics data. They use three
categories to reflect the uncertainty from “uncertain” to “well-established”. The
data is then shown in a matrix of the certainty and the severity of the health effect.
Additional color-coding, size, and border of the displayed gene variants are used
3.4 Interpretation of Uncertain Data 31
to communicate additional properties. This shows that HCI has to deal with
different types and sources of uncertainty depending on the application scenario.
3.4 Interpretation of Uncertain Data
Many problems in communicating uncertain data arise due to missing knowledge
or misinterpretation of the presented data. In the following, we present related
work focusing on the understanding of uncertain data, the influence of com-
municating uncertain data on decision-making, and how presenting uncertainty
information impacts confidence and trust.
3.4.1 Problems of Understanding
Very early on, Tversky and Kahneman [1975] identified a set of biases and
heuristics that people have when making judgements under uncertainty. Humans
are, for example, insensitive to the prior probability of outcomes and to the size of
the sample. The authors also argue that humans struggle to understand regression
and tend to believe what is imaginable by them. Such biases and heuristics have
to be taken into account when communicating uncertain data.
Besides understanding statistics and uncertainty information, humans also have
problems in understanding accumulation in stock-flow systems. Sweeney and
Sterman [2000] conducted experiments to examine the general system thinking
ability of participants focusing on stocks and flows (e.g., inflow, outflow, and
stock of water in a bathtub). Well educated participants had huge problems in
understanding the relationships between the net inflow and the slope of the stock.
Humans probably match the behavior of the stock to the pattern of the flow which
even seems to be independent of other factors such as knowledge and motivation
[Cronin et al., 2009].
3.4.2 Decision-Making under Uncertainty
In Psychology, multiple studies have showed that providing uncertainty informa-
tion improves decision making. Roulston et al. [2006] conducted a study where
participants had to play a game in which they needed to salt the road based on
an overnight minimum temperature forecast. Groups of participants received
32 3 Related Work
different information to play the game: a point estimate, a point estimate and the
standard error, or an additional probability of freezing. The results indicate that
uncertainty information (in this case the standard error) helped participants to
increase profit and make better decisions. Joslyn and LeClerc [2012] ran a very
similar study where people performed significantly better on all measures when
they had uncertainty information to decide whether to salt a road or not. Providing
a decision aid without providing uncertainty information did not help participants
to make better decisions. In contrast, forecasts providing uncertainty information
seemed less wrong for participants and overall more reliable. This shows that
showing uncertainty can actually increase trust and has positive consequences.
Additionally, people like to make their own decisions and do not necessarily
follow decision aids but rather follow their own judgement, especially if forecasts
seem to be wrong.
Nadav-Greenberg and Joslyn [2009] ran two experiments to examine whether peo-
ple make better decisions if confronted with uncertainty information in weather
forecasts. They used different verbal, numerical, and one visual representation
to present the forecast. The results showed that presenting uncertainty does not
lead to perfect, but more optimal decisions than without uncertainty information.
However, more information was not necessarily better. Depending on the format
of presentation, people performed better or worse. In the experiment, participants
also received feedback, so it might require some training to learn how to use the
additional information.
Savelli and Joslyn [2013] conducted three experiments in which they showed
that forecasts using predictive intervals lead to better decisions as they support
the identification of unreliable forecasts and help participants to gain a better
understanding for the range of the outcome. In their experiments, they compared a
deterministic forecast with a verbal presentation, a plus/minus presentation, and a
bracket representation. In one experiment, they also added a gradient visualization.
One main finding was that participants interpreted some information as diurnal
fluctuation instead of predictive intervals. This indicates that predictive intervals
might be better communicated in text. However, the visualization might just have
been too close to what users normally know to be used to communicate diurnal
fluctuation and therefore was mistaken.
Ramos et al. [2013] conducted an experiment where participants acted as decision-
makers in a flood forecast scenario. They had to make a decision on whether
to open a flood gate to avoid flooding a town. The flood forecast was provided
including different levels of uncertainty information. The authors found that
uncertainty information increased participants’ optimal decisions and lead to a
3.4 Interpretation of Uncertain Data 33
smaller variance in all decisions. Additionally, the higher the uncertainty was, the
more participants avoided taking a risk and opened the gate.
Roulston and Kaplan [2009] present another study where participants got a five-
day weather forecast to decide which one of two criteria would most likely
occur during the next days. One group had to make the decision based on a
point estimate while the other received uncertainty information. The latter group
performed better than the first group, but did not need significantly more time
for their decision. The authors speculate that participants without uncertainty
information may have made it up in their heads as they were probably aware
about the uncertainty of the forecast.
All studies presented in this subsection show that participants made better deci-
sions when uncertainty information was presented to them. This indicates that
uncertainty should be communicated in interactive systems to support users in
making most optimal decisions.
3.4.3 Confidence and Trust
Morss et al. [2008] conducted a nationwide survey of the U.S. public about
forecast uncertainty. They found that forecast users were well aware of the
uncertain nature of forecasts and expect forecast errors, and stated a preference
for forecasts explicitly communicating uncertainty. Users’ confidence in forecasts
is mainly influenced by two factors: First, the time span of the forecast, as users
are less confident in long-term forecasts; and second, the forecasted information,
for example users generally seem more confident in temperature forecasts than
in the probability of precipitation. In a further survey including scenarios in
which participants had to make decisions on taking proactive steps or not, Morss
et al. [2010] found that participants use different criteria to determine whether
proactive steps are necessary or not. The authors conclude that forecasts should
not show concrete decision criteria to allow users to use their own criteria to make
individual judgements.
Very similar is the finding of Jung et al. [2015] that displaying uncertainty in an
electric car will reduce users range anxiety. Although single numbers are easier
to read and understand, disguised uncertainty can impact on the user experience
and increase anxiety. The same principle applies for bathroom scales where gaps
in the understanding of the users about the accuracy and fluctuations of body
weight might decrease the trust in scales. Kay et al. [2013] therefore proposed to
34 3 Related Work
redesign the interface of scales instead of focusing on accuracy to consider the
uncertainty in the measurement and due to fluctuations.
Regarding context-aware systems, it is not really clear whether uncertainty in-
formation improves user experience and helps to generate trust. Antifakos et al.
[2004] found that displaying uncertainty can actually have positive effects on task
performance and that uncertainty information helps users to better understand the
system state and how well it works. Additionally, Lim and Dey [2009] identified
certainty as important information that should be provided to the users of context-
aware systems although it may depend on the context or kind of information
provided. In a further piece of work, Lim and Dey [2011] conducted a survey with
usage scenarios for context-aware systems. They found that showing uncertainty
for applications with high certainty increases users trust and makes it easier for
them to forgive errors. However, if the uncertainty of an application is too high,
users might lose their trust in the application and perceive it as not good enough
to be used. Rukzio et al. [2006] showed uncertainty in a user-study for an appli-
cation which autofilled forms on websites. Uncertain fields were highlighted by
colors. Participants did not trust the visualized uncertainty and checked each field
for correctness anyway instead on taking the visualized probability into account.
Thus, in this specific use case displaying uncertainty was not advantageous. For
context-aware applications, the display of uncertainty might therefore heavily
depend on the importance of the task and the goal of the users.
In machine learning, classifier accuracy is a common measure. Kay et al. [2015]
developed an evaluation method for the acceptability of the accuracy. This is
an important step towards creating machine learning algorithms that increase
confidence and trust of users.
3.5 Simulation Tools for Non-Experts
For educational purpose, simulation tools for non-experts already exist. These
tools were especially created to allow children or students to play with simulations.
One of the first tools in the educational context was the Alternate Reality Kit
[Smith, 1986] which allowed students to build physics simulations. Another
early tool was Playground [Fenton and Beck, 1989] that allows children to
add rules to graphical objects to construct simulations. Most of the existing
simulation tools for non-experts use agent-based modeling approaches [Railsback
and Grimm, 2012] and try to substitute or simplify programming. KidSim
[Cypher and Smith, 1995; Smith et al., 1994] uses the technique of programming
3.6 Insights from Related Work 35
by example/demonstration, whilst users of StarLogo [Resnick, 1996] have to
attach puzzle pieces to each other to create a simulation. NetLogo [Tisue and
Wilensky, 2004] in contrast uses an own programming language, but provides
finished models that can be loaded and explored in a graphical user interface.
3.6 Insights from Related Work
From related work, we can learn that uncertainty communication is relevant
for many different application scenarios inside and outside of HCI. However,
uncertainty communication and visualization is challenging as, for example, it has
many sources, adds an additional dimension to the problem and the visualization,
and requires interdisciplinary work. The field of HCI, which only recently started
to address uncertainty in interactive systems, can help to tackle some of these
core challenges.
Uncertainty can be communicated in various ways with linguistic expressions,
numerical probabilities, or complex visualizations. Each form of communication
has potential disadvantages, which need to be taken into account when designing
a system. In past studies, visualizations such as line charts, dotplots, and proba-
bility distribution functions have proved to be promising. One good reason for
communicating uncertainty is that users might lose trust in an application if it
does not communicate its uncertainty. However, they might also see uncertainty
as a shortcoming of an application.
Further studies showed that forecast users are aware of uncertainty and make
better decisions if uncertainty information is presented. However, the information
should not be overwhelming or confusing. These positive effects of communica-
ting uncertainty indicate that interactive systems should communicate uncertainty
to better support humans in decision-making. The previous studies and visualiza-
tions can serve as a starting point to explore the communicating of uncertainty in
interactive systems.
As this work was carried out in the context of the Cluster of Excellence in
Simulation Technology at the University of Stuttgart, we decided to embed our
work into the development of an end-user simulation tool. Current simulation
tools for end-users are mainly used in education and not suitable for simulating
personal application scenarios, but rather help to understand common principles.
A more general simulation tool focusing on the communication of uncertainty
could serve as a vehicle for future research in this area. In the next chapter,
36 3 Related Work
we therefore identify functional and non-functional requirements as another
foundation for our work.
Chapter4
Understanding Simulation
Users
In this chapter, we present five connected pieces of research designated to un-
derstand simulation users. On one hand, we explored how simulation experts
work and what tools they use (see Section 4.2) and on the other hand collected
some real usage examples and ideas from the general public (see Section 4.3).
We mainly collected qualitative data using questionnaires, a diary study, focus
groups, and design workshops.
The main goal of the research described in this chapter is to lay the foundation
for the rest of this thesis by understanding prerequisites of simulation usage
and potential users. Studying expert users of simulation tools helps to learn
about common processes, workflows, pitfalls, and difficulties of using simulation
tools. Studying the current usage patterns, understanding, and expectations of
the general public helps to understand key requirements and features to lay the
ground for developing a simulation tool for end-users.
Parts of this chapter are planned to be published as follows:
• M. Greis, V. Zeamer, N. Henze, and A. Schmidt. Predictive Simulation
Services for Everyday Life Usage.
38 4 Understanding Simulation Users
Table 4.1: Detailed research questions for our research on understanding
expert simulation users.
Research Question Subsection
What steps of the simulation process do experts work in? 4.2.1
What modeling approaches and tools do experts use? 4.2.1
What is the workflow of simulation experts? 4.2.2
Do experts have and need programming experience? 4.2.2
4.1 Definitions
In this chapter, we distinguish between two potential user groups of simulations:
experts and the general public. We define simulation experts as people who
regularly create or execute models to run simulations or build simulation tools
as part of their work. This includes people working in research and industry.
Conversely, we define the general public as people who do not regularly create
or use models or simulations in a professional or work-related context, but who
mostly have contact with simulations by consulting those created by experts, for
example, weather forecasts. We refer to the general public as non-experts.
4.2 Simulation Experts
We conducted an online survey and a follow-up paper questionnaire with simu-
lation experts to gain a better understanding of how they work. Specifically, we
were interested in their processes, workflows and requirements for their work. We
additionally aimed to understand what tools they use and what knowledge they
are required to have for using these tools. The detailed research questions are
summarized in Table 4.1. The first inquiry in the form of the online survey con-
sisted of a more general questionnaire to generate first insights. These served as
an input for the more specific paper questionnaire to understand relevant details.
4.2.1 Online Survey
To get a first understanding on how simulation experts work, how they create
models and how they use simulations tools to run simulations, we created an
4.2 Simulation Experts 39
online survey and shared it with members of the Cluster of Excellence in Simula-
tion Technology at the University of Stuttgart and further simulation experts in
research.
Method
We conducted an online survey where we first asked participants for the steps
of the simulation process they work on (data gathering, modeling, simulation,
visualization, and interpretation) and what exactly they do in these steps. Ad-
ditionally, we asked them to describe and classify the modeling and simulation
method they use during work. We furthermore collected simulation tools that
they know and use. At the end of the questionnaire, participants were able to
rate up to five tools with the help of a Computer System Usability Questionnaire
(CSUQ) [Lewis, 1995].
Participants
Our survey was completed by 48 participants who worked with simulations
in a research context. Most of these were PhD students at the University of
Stuttgart, but also postdocs, professors and researchers; including some from
other universities. Participants had backgrounds in a various amount of disciplines
such as Engineering, Natural Sciences, Computer Science, Mathematics, Social
Science, and Economics.
Results
We found that 12 participants worked in only one step of the simulation process
(mostly simulation), while the rest worked in two or more steps. Most of these
worked in either modeling (27 participants) or simulation (33 participants), which
was also the most frequent combination (13 participants). Participants specified
that their work in these steps includes creating new models, running simulations,
improving the simulation infrastructure, or analyzing the impact of simulations.
With the modeling and simulation method, most participants specified that they
worked with differential equations. Some of them additionally specified the
method that they used for solving these equations, which were mostly finite
element method (FEM), finite volume method (FVM), molecular dynamics,
atomistic modeling, or Monte Carlo methods.
Participants specified a huge range of tools that they use for creating and running
models. Out of 70 mentioned tools, 58 were only named once. The most named
40 4 Understanding Simulation Users
tool was Matlab with 13 mentions; all other tools had fewer than five mentions.
Additionally, 68.8 % of the participants used self-written programs. The CSUQ
questionnaire was filled for 42 tools and the overall average of all questions was
positive, which shows that most of the participants were satisfied with their tools.
Nevertheless, according to our participants, 57 % of the tools would benefit from
improvements such as a better interface (10), advanced features (8), or better
documentation (7).
Discussion
We found that simulation experts rarely work in all steps of the simulation process.
They often collaborate with other experts who work in the adjunct steps. That
also implies that different tools are used for each step. As each step seems to
be highly specialized, it is obvious that the general public will not be capable of
completing all steps of the simulation process. Especially the modeling step needs
expert knowledge and mostly requires programming experience. A possible tool
that supports the general public in running simulations clearly has to separate the
steps of the simulation process. Other steps such as the modeling step might be
still conducted by experts and hidden from non-experts.
Simulation experts also work on improving the infrastructure for their simulations.
The general public is not able to improve the infrastructure as they probably do
not have programming skills to do so. It is very important that models targeted
for the general public need to run on devices and infrastructure that the general
public already owns, e.g., smartphones and laptops. Alternatively, web-based
tools could make simulation runs on a server where only the results are sent to
the user.
With the modeling and simulation method, experts mainly specified to use differ-
ential equations, but some also had difficulties to actually specify their method.
Differential equations are obviously not suitable for the general public, so ways to
reduce the mathematical complexity or hide this complexity from the non-expert
user are necessary to make a simulation tool easy to use.
We also found that the tools experts work with are highly specialized and often
self-written. They mostly work for one specific problem. For the general public,
it would be ideal to have one tool that supports multiple simulations. This would
reduce the time to learn the tool as concepts could be transferred every time the
tool is used.
4.2 Simulation Experts 41
4.2.2 Paper Questionnaire
Based on the results of the online survey, we constructed a paper questionnaire
to collect more detailed feedback. We gave out the paper questionnaire on a
status seminar of the Cluster of Excellence in Simulation Technology from the
University of Stuttgart. All participants of the status seminar were simulation
experts working with simulations on a mostly daily basis at the university.
Method
The questionnaire that consisted of a two-sided paper. We first asked participants
for their current position, their field of work, and what application area of mod-
eling and simulation they were working in. As in the online questionnaire, we
asked for the tools that participants use, but this time divided them by the steps
in the simulation process. Additionally, we asked them to describe or draw their
workflow when working with models and simulations with the help of the tools
that they mentioned. We then wanted to know which programming languages
they use and how much they agree on a five-point Likert scale with the following
statements:
• I am an experienced programmer.
• I need programming experience to complete my tasks.
• I have enough programming experience to complete my tasks.
Participants
In total, 68 participants filled out the paper questionnaire on the status seminar.
Most of the participants were PhD students in different fields such as Engineering,
Biology, Computer Science, Mathematics, Chemistry, and Physics. They worked
on a range of models and simulations with different applications, e.g., in water
management, chemical reactions, cancer cell populations, blood flow, or cracks
in material.
Results
As already found in our first survey, participants in general indicated that they
worked in more than one, but not all steps of the simulation process. Most used
different tools for different steps in the process. One participant described this
as a tool chain approach where the output of one tool in one step would directly
42 4 Understanding Simulation Users
Figure 4.1: Depicted simulation workflow of one participant on the paper
questionnaire.
be used as the input of the next tool. However, not all participants used such an
approach as not all tools can be used sequentially. Figure 4.1 shows a drawing
of one participant that correlates very well to a workflow many participants
working in multiple steps of the simulation process described. One important
aspect described by another participant was feedback cycles: “Choose appropriate
available model, modify model (*), implement model, run simulation, analyze
results + visualization, interpret results, repeat from step (*) if needed”. Steps of
the simulation process might be repeated if the outcome is not satisfactory.
For the analysis of programming experience, we converted the Likert scale
items to numbers: 1 corresponding to “totally disagree” and 5 corresponding to
“totally agree”. Participants named 21 different programming languages that they
regularly use. The most named languages were C++, Matlab, Python, Fortran,
and C. The majority of the participants agreed with the statements that they are
an experienced programmer (M = 3.8,SD= 1.0), need programing experience
to complete their tasks (M = 4.0,SD = 1.1), and have enough programming
experience to complete their tasks (M = 4.2,SD= 0.8).
Discussion
From the workflows provided by participants, we learned that each step of the
simulation process has a specific set of tools associated with it. Sometimes tool
chains exist that allow for a tool’s outputs to serve as an input for the next tool, but
most of the time extra conversion or programing was necessary. We additionally
learned that simulation experts use cycles and go back to earlier steps if they
realize that they made a mistake in the model or the output is not as expected.
Such feedback cycles need to be taken into account and be supported.
Programming experience is widely spread around simulation experts, which
cannot be expected from the general public. However, experts’ programming
4.2 Simulation Experts 43
experience is mostly very specialized on the tool that they use and some experts
also stated not to have much programming experience. In general, a tool designed
for the general public should not rely on extensive programming knowledge as
experts might not know the specific language used by the tool. Additionally,
this would help domain experts without programming experience to also create
models.
4.2.3 Implications from Expert Simulation Usage
Based on the results of the online survey and the paper questionnaire, we devel-
oped implications for the future design of simulation tools for the general public
by looking at common usage patterns of experts and their difficulties in using
simulation tools. In the following, we present the five main requirements.
Adaption for Available Infrastructure. A simulation tool for the general public
should run on a device that the general public uses regularly, e.g. a mobile phone
or a laptop computer. As non-experts normally do not have the ability to improve
the infrastructure of their simulations (besides buying a new device), the models
have to be developed in a way to take this into account. Alternatively, web-based
tools could compute simulations on servers and only show the results to the users.
Generality. Most simulation tools are specifically built for one use case and
one step in the simulation process. For the general public, it would be easier to
have a general tool that supports all steps and different contexts. Even experts
could profit from such a tool as knowledge transfer would be easier and learning
time would be reduced. For non-experts, this knowledge transfer is even more
important as they may have more difficulties to understand different tools and
cannot create a tool chain to ease the use of multiple tools.
Separate Steps. Non-experts might not have the ability or knowledge to create
models. Therefore the model building step should be left to experts. A potential
tool needs to carefully separate the steps of the simulation process. Experts could,
for example, develop models and then share them with the general public to run
their own simulations with the models.
Hide Mathematics. Experts mainly use differential equations to build models.
However, the general public usually has no experience in working with differential
equations. A method for explaining the models to non-experts has to be found
that allows for hiding the mathematics while still getting across the purpose and
functionality of the model.
44 4 Understanding Simulation Users
Table 4.2: Detailed research questions for our research on understanding
non-expert simulation users.
Research Question Subsection
How does the general public currently use simulations? 4.3.1
What understanding does the general public have of simulations? 4.3.2
What use cases are promising for future simulations? 4.3.2
How do future simulation services have to be designed? 4.3.3
Minimize Programming Knowledge. Simulation experts on average have a
high amount of programming experience. A tool designed for non-experts should
minimize the amount of programming knowledge needed. This would also
benefit experts without programming knowledge and experts with very specialized
programming knowledge.
Support Feedback Cycles. When experts work on the simulation process, the
work conducted is not completely linear. They often use feedback cycles or
iterations to go a step back and start again from an earlier point. This happens if,
for example, the model is erroneous or the output is different than expected. A
simulation tool needs to support such feedback cycles to be effective.
4.3 Non-Experts
As one of the first steps towards building a simulation tool for the general public,
we wanted to understand the current situation and the potential of the usage
of simulations in everyday life. We additionally aimed to understand design
requirements for simulation services. We therefore conducted a diary study, focus
groups, and design workshops focused on predictive simulations. We mainly
focused on predictive simulations as these are currently already used by the
general public, e.g. in the form of weather forecasts. Our detailed research
questions are outlined in Table 4.2. The diary study mostly focused on the
first question on how the general public currently uses simulations. The focus
groups focused on the understanding of simulations and what use cases could
be promising for future simulation services. In the design workshops, we then
collected concrete features and prerequisites that are needed for designing such
services.
4.3 Non-Experts 45
4.3.1 Current Simulation Usage in Everyday Life
We first wanted to get a better understanding of when, why and how the general
public already uses simulations in everyday life. We mainly focused on predictive
simulations (forecasts) as their occurrences are easy to spot for non-experts. We
collected everyday life examples by conducting a diary study and follow-up
interviews.
Method
We created DIN A 5 leaflets that included all information about the study, a
form for demographic information, four examples of potential diary entries, and
22 empty spaces for participants to note down any occurrences of when they used
predictive simulations. In initial sessions with up to six participants, we gave a
short oral introduction to the study and defined what predictive simulations are,
talked them through the explanation and the examples, and answered questions.
Every participant received a leaflet, a pen, and candies as reward for participation.
We asked participants to carry the leaflet with them for one full week to note every
occasion in which they used predictive simulations. We called these occasions
“information about future occasions” or “forecast” to make it more understandable.
To give participants a structure for their diary entries, we asked them to answer
the following four questions per entry:
1. At which point in time and in which situation did you need information
about the future?
2. Which information did you need?
3. Why did you need this information?
4. How did you access the information?
We used the following examples not to limit the thinking of participants, but help
them to understand the task:
• The weather forecast was watched on TV in the evening to know what
clothing to wear and whether an umbrella would be needed on the next day.
• The potential driving time from home to a friend’s house was checked on a
navigation website to know when to leave home.
46 4 Understanding Simulation Users
• Before going to work, the website of the local public transportation com-
pany was consulted for information about train delays to find out if I would
be in time for an important meeting.
• Forecasts of the results of soccer world championship were needed after
work because colleagues wanted to bet on the results. With the help of a
search engine different websites were compared.
All four examples were provided in the way participants were instructed to take
their notes. After one week, participants had to return the diary and participate in
a short interview. The interview was held in an open form and was used to talk
about unclear situations and blanks in the diary. After completing the interview,
participants received a mobile phone cleaner as a thank-you gift.
Participants
We recruited 38 participants of whom 25 (18 male, 7 female) handed their diary
back to us. Two of the participants who did not hand it back reported that they did
not have any occasion to enter in the diary. In the following, we will only report
about participants who handed their diaries back to us. The average age of our
participants was 31.0 (SD= 11.8). Regarding the highest level of education, three
of them had completed a secondary school diploma, six a high school diploma,
five had received a Bachelor degree, ten had received aMaster or equivalent degree
and one participant had completed a PhD. Eleven participants were students with
different subjects such as Computer Science, Electrical Engineering, Biology,
Chemistry, and Management Sciences. The other participants were employees
in different fields, including secretaries, teachers, postmen, and logisticians. On
average, each participant noted 7.2 (SD= 4.7) entries in the diary; between one
and 21 entries per person.
Results
In total we received 178 diary entries, but had to exclude 32 from the analysis.
The excluded entries did either not cover a situation in which a predictive simu-
lation was used or described occasions in which participants did not receive the
information that they needed. In the following, we therefore analyzed 146 diary
entries. We coded the diary entries according to the following five categories that
we identified from the questions we asked: context, type of information, future
point in time, reason, and device.
4.3 Non-Experts 47
Context. In 78 entries, participants included a point in time when they needed
the information. Besides predictive simulations that were used regularly, specific
ones were intensively used during a short period of time. Participant 10 (m, 26 y.),
for example looked at the forecasted day of delivery for his mobile phone several
times a day until it arrived because he wanted to give his old mobile phone to a
friend.
Participants used forecasts after activities in 18 situations. In most of these
situations, the forecast was integrated in the daily routine and attached to an
activity such as getting up, having breakfast, or arriving at a specific location
(e.g. workplace, holiday destination). Participants also referred to very specific
situations, e.g. after cutting trees, after a new renter moved into a flat next to
them, or after their internet connection stopped working. In these situations, the
need of the predictive simulation was triggered by the activity itself. Forty entries
contained information about an activity that followed the use of a forecast such
as going to a specific location (e.g. workplace, bed, gym), before traveling (e.g.
in the car or train), or before having a barbecue. In these occasions, the forecast
was mostly used to ensure that future activities could be done or to estimate how
long they would take. In 48 entries, participants reported that they used forecasts
while, for example, having or preparing a meal; working; planning for weekends,
cinema visits, or vacations; sitting in the train or bus; or watching TV.
Type of Information. In total, participants needed 38 distinct types of infor-
mation during the diary study. In 33% of the situations, they needed weather
information. Other uses included predictive simulations about the delay of public
transport (6%), traffic situation (5%), and opening hours (5%), and 12% of
the predictive simulations showed the availability of objects (e.g., appliances,
ingredients, free seats in a train) or persons. The majority of the information was
only requested once by a single participant, e.g. the development of the gasoline
price, the development of the stock price, the expected energy costs, the expected
recovery time after being ill, and the potential transfers of football players.
Participant 17 (m, 31 y.) noted one occasion in which he needed more than
one type of information at the same time: the weather forecast, the delay of
public transport, and a traffic forecast to decide his means of transportation (bike,
train, car) for going to work (see Table 4.3). In this specific case, the participant
combined multiple predictive simulations to make a more complex decision that
could not be solved by a single predictive simulation.
Future Point in Time. 114 entries specified the future point in time that the
prediction was made for. Most of these entries corresponded to short-term predic-
tions, more specific in 73 % of the occasions, participants needed information
48 4 Understanding Simulation Users
Table 4.3: Diary entry from participant 17 (m, 31 y.)
Situation In the morning when getting up.
Information What’s the weather like today? Do the trains run normally?
Are there any traffic jams on the way to work?
Reason Decision what means of transportation is the best to get to
the workplace.
Access Method Weather application on my smartphone and own inspection
(sun, clouds, ...), public transportation application, Google
Maps, radio traffic service.
Table 4.4: Diary entry from participant 24 (m, 32 y.)
Situation Lots of work on a Wednesday
Information How much work do I have to expect during the next weeks?
Reason To plan leisure time
Access Method Asking colleagues for their experiences in the last years
about the same day, sometimes only minutes in advance. Only three entries
referred to long-term forecasts for a period past the next week, e.g., Participant 24
(m, 32 y.) reported a situation where he needed an estimate of the energy costs for
the next winter to calculate house-keeping costs. For some entries, a categoriza-
tion of the future point in time was not possible as the time itself was forecasts,
e.g., the day of delivery of a package, or the recovery time needed when being ill.
Reason. We found five main reasons why participants used predictive simulations:
decision-making (e.g., which clothes to wear, which route to take), planning of
activities (e.g., schedule appointments, check availabilities), knowledge gain (e.g.,
when someone returns, potential energy costs), avoidance of unwanted situations
(e.g., getting wet, long waiting time), or saving of money and time. Other reasons
such as curiosity, boredom, or interest only appeared in one or two entries.
Device. Participants mainly used their computer or their mobile phone (62 sit-
uations) to access the forecast information. In 27 situations, participants asked
other persons or trusted their own experience for making a forecast, for example
participant 24 (m, 32 y.) reported a situation in which he needed information
about the workload during the next weeks and asked colleagues for their expe-
riences of the last years (see Table 4.4). In these cases, computer models were
not available. Occasionally, participants used written documents, TVs, or radios.
In nine situations, participants used more than one device, either because they
needed several pieces of information or they did not trust a single source. Other
4.3 Non-Experts 49
participants also reported using multiple applications on the same device if they
did not have enough trust in one source of information.
Interviews
We interviewed 15 out of the 25 participants to discuss their diaries, further fore-
casts they had already used, and other forecasts they could imagine using in the
future. For other forecasts that they had used, most participants mentioned ones
reported by other participants in their diaries. Participant 3 (f, 30 y.) additionally
mentioned forecasts of the current political development and upcoming elections.
Participants had distinct ideas of what forecasts they would like to use in the
future which covered topics such as health, work, finances, etc. Participants
also suggested to combining different forecasts, for example weather forecasts,
plans of friends, potential costs of activities, and upcoming events to enable their
planning for a weekend.
Discussion
In our diary study and the follow-up interviews, we identified two types of users.
Half of the participants used forecasts very often, while the other half only used
them in extraordinary situations. Based on the reported usage, we therefore
identified three usage patterns: regular usage (e.g. daily, weekly), intensive
usage during a short period, or specific usage which was highly dependent on the
individual, the context, and the activities. Interestingly, participants mostly used
short-term forecasts. This could be an indicator that either long-term forecasts do
not yet exist or are not used very often. This could be a starting point for offering
new services as well as supporting the fact that participants did not necessarily
trust a single source or had to combine different forecasts to get to a conclusion.
The combination of multiple forecasts could therefore be promising. However, as
participants had very distinct ideas for new services, addressing and supporting a
larger group of people with one new predictive service could be difficult. Further
exploration on potential use cases is necessary.
4.3.2 Developing Definitions and Potential Use Cases
In order to get a better understanding of what people know about simulations
and what type of predictive simulations they would like to use in the future, we
conducted focus groups.
50 4 Understanding Simulation Users
Method
In total, we conducted three focus groups with up to six participants. Participants
were invited by sharing our request on social media and with an e-mail list of
prospective participants. The sessions took between 1 and 1.5 hours. After a short
introduction, we collected and discussed participants’ usage und definitions of
predictive simulations. In a phase of idea creation, participants had to imagine
predictive simulation services that they would like to use in the future discussing
pros and cons of their ideas with a partner.
Participants
Each of the three focus groups was conducted with a different target audience.
The first group was held with four German simulation experts (3 male, 1 female)
who regularly use simulations at work; the second was held with six students
(5 male, 1 female), and the third with four workforce employees (3 male, 1 female)
consisting of a postman, a sales woman, and two electrical engineers. We selected
these three target groups to get opinions of people with different education levels.
All simulation experts had or were pursuing a PhD, the students were completing
a college degree, and the employees had some vocational training as their highest
education level.
Results
We transcribed the focus groups and collected all material produced by the
participants to analyze the data. To extract insights, the data was categorized by
two independent researchers who in a second step compared and combined their
categorizations. In total, we had 14 participants creating around 60 ideas for new
predictive simulation services. In the following, we present their definitions of
the term simulation and the four most popular ideas for future applications in the
form of use cases.
Definitions. Most of the participants had distinct definitions of the term simu-
lation and especially the simulation experts did not agree on the definitions of
others. Two experts focused on the fact that mathematical functions are used to
generate concrete results, while one expert mainly focused on the aspect that a
simulation is computer-supported and needs computing power to be conducted.
Only one expert used a very formal definition of the term.
“A representation of reality/real phenomena in an abstract and re-
duced manner to analyze properties, structures and mechanisms of
4.3 Non-Experts 51
the phenomenon and to make statements about the future behavior
of the phenomenon.” (female, simulation expert)
Participants of the student group either focused on the fact that simulation means
to execute a model, is a virtual representation of reality, or makes it possible to
observe a process without manipulating the real world.
“Test something without consequences in the real world (forecasting)”
(male, student)
Two of the employees mainly focused their definition on the visualization part,
representing something virtually or in 3D while the other two focused their
definition on the aspect that simulation reproduces functions of a real system.
The employees also argued about whether simulations can only happen in the
brain when thinking about the outcome of a situation. This reflects a very abstract
definition and understanding of the term.
Use Cases. The following four use cases were discussed in at least two of our
three focus groups, which highlights their importance.
Use Case 1 - State of Health: Predictive simulations could be used to simulate
users’ personal health taking into account genetic endowments and their
current situations (e.g., nutrition, medicaments). The application would
support users in deciding whether to see a doctor or what to do to avoid
getting ill.
Use Case 2 - Personal Fitness: Predictive simulations could support sport activ-
ities, e.g., by calculating the efficiency of the training beforehand, helping
with a healthy execution, or selecting a suitable sport matching personal
goals and prerequisites.
Use Case 3 - Personal Finances: Predictive simulations could support financial
decisions and financial management, e.g., how long their money might last
if a person quits a job or moves to another country, or whether follow-up
costs such as insurances and repairs when buying a car will be manageable.
Use Case 4 - Education Path: Predictive simulations could also help to choose
an educational or career path, for example how a specific course of studies
might influence the later work life.
52 4 Understanding Simulation Users
Discussion
The results of the focus groups revealed that people in general have a broad
and very distinct understanding of the term simulation independent of their
background. One key argument in the discussion was whether simulations have
to be computer-supported or can also be performed by thinking. Besides few
participants having very detailed or mathematical definitions, they mainly focused
on one aspect to define the term simulation, for example the virtuality, support
by computers, model execution, or 3D visualization. We conclude that the broad
term simulation should be avoided as it could be misleading, furthermore users
of simulation services might also have problems to understand the benefits of a
service and to transfer concepts between different services. For example, the term
forecast seems to be a better choice for predictive simulations. If the general term
simulation is needed, it should be carefully explained to avoid misunderstandings.
As use cases for future development, participants mentioned short-term along
with long-term predictive simulations. The most promising application areas are
in health, finances, or education-related areas.
4.3.3 Future Usage of Simulations in Everyday Life
Based on the use cases developed in the focus groups (see Subsection 4.3.2),
we conducted design workshops to extract a set of critical features and design
guidelines for predictive simulation services. We selected the health-related use
cases as these were the most prominent ones in the focus groups.
Method
We ran three design workshops with 6 to 12 participants. Each workshop took
between 2 and 3 hours. Participants were invited through e-mail and social media
channels of the university urging to those interested in predictive health-related
services.
All design workshops followed the same pattern. After an introduction of the par-
ticipants and the topic, participants completed individual brainstorming sessions
with one of two design cases that we derived from the focus group results:
1. Find your ideal sport: (a) What is the best sport to lose weight? (b) Should
I go swimming, go to the gym, or take a ride with my bike? (c) How often
do I have to train to be successful? (d) Does the training suit my body and
my individual abilities?
4.3 Non-Experts 53
(a) Feature discussion (b) Sketching (c) Preparing for presentation
Figure 4.2: Design workshops about predictive health services for everyday
life usage.
2. Avoid illness: (a) What can I do to prevent illness? (b) How can I boost my
health?
By introducing different prompts, each participant individually thought about their
concerns, relevant items and information, and possible features of an application
supporting the design case. Based on their choices, participants were then put
into pairs to develop wireframe sketches. At least one participant of each pair had
previous sketching experience. At the end of the workshop, each pair presented
their wireframe sketches and discussed them with the other participants (see
Figure 4.2).
Participants
In total, 26 participants (15 male, 11 female) attended the design workshops.
All of them were students, but their courses of studies (e.g., Media Informatics,
Software Engineering, Mathematics, Psychology, Education, Industrial Engineer-
ing) and education levels differed. Twenty two had prior design experience. The
average age of the participants was 22.9 (SD= 3.3) years. We compensated them
with food and roughly half of the students received class credit.
Results
In total, we collected 13 sets of interface sketches from our participants. Two
independent researchers analyzed all sketches by coding the features with a visual
and textual coding approach, afterwards comparing and combining the results. In
total, we identified eight categories of features included in the designs:
Data Baseline. Each set of sketches included a data baseline feature. We identi-
fied four types of input data that were requested for the data baseline in different
54 4 Understanding Simulation Users
sets of sketches: data about the user’s health (e.g. weight, height, heart rate,
etc.), user’s preferences (e.g. experience of sports, team or individual sports,
etc.), user’s emotions, and general constraints (e.g. physical abilities, amount
of available money and time, etc.). For entering the data, the interface sketches
offered three possibilities: manually by an expert (e.g. doctor), manually by the
user, or automatically by connected devices. The discussion of the interfaces
revealed that most participants did not want to enter health data manually as it
was perceived as cumbersome, but rather would prefer an application to collect
the data automatically (e.g. by performing a body scan).
Goal Setting. Seven sets of sketches included a goal setting feature, which
mainly consisted of a questionnaire item. Users either could pick a very general
goal from a list of goals (e.g. lose weight) or specify a detailed goal (e.g. lose x
kilograms).
Tracking. Each set of sketches included a tracking functionality. Mainly the
activity, food intake, and environment data were tracked. Most interfaces relied on
automatically tracking the data, but three required manual input. In the discussion
other participants stated that manual input would be too cumbersome for them.
Prediction. Each set of sketches included a prediction feature. All of the sets
of sketches included short-time predictions in the form of suggestions (e.g., a
type of activity, healthy food, etc.) as depicted in Figure 4.3a. They mostly
focused on the very near future or the next hours. Six interfaces additionally
included long-term predictions for at least one week up to one year into the future.
Participants envisioned distinct visualizations of the long-term predictions such
as charts and diagrams (see Figure 4.3b), a future look of the user’s body (see
Figure 4.3c), a holistic view of past, current and future self (see Figure 4.3d), or
an interactive video. In discussions, participants voiced a preference for future
body visualizations instead of charts as they assumed these would boost their
motivation.
Execution Plan. Each set of sketches included an execution plan designed to
achieve a future goal. Participants mentioned that the execution plan needed to
have realistic and believable steps in order to make the predictions trustworthy.
The execution plan varied between a full schedule of activities or only activity-
based visualizations (e.g. how much did I already cycle this week and how much
more do I need to cycle to complete my goal?).
Location. Eight sets of sketches included a feature that made the application
location-aware. Participants voiced the concern that without taking the locations
4.3 Non-Experts 55
(a) Suggestion for sport activities based on
health assessment and goals.
(b) Prediction of future health data.
(c) Visualization of future self in a mirror. (d) 3D Model of past, current, and future self.
Figure 4.3: Participants’ sketches containing predictive features of the inter-
faces.
56 4 Understanding Simulation Users
into account, predictions could be wrong or unrealistic (e.g. suggestion to go
swimming when there is no swimming pool nearby).
Social. Five sets of sketches included a social feature such as a functionality to
meet other people, a social media connection, or a chat.
Game. Only one set of sketches included a game, which allowed the user to level
an avatar.
Discussion
The findings from our design workshops underline that most potential users wish
to have automatic data collection when it comes to parameter input for predictive
simulations. Nevertheless, some data such as emotions, personal preferences, or
personal constraints, still need manual input which is accepted by users. The
predictive aspect of the application has to support short-term and long-term
suggestions at the same time and visualize the simulation results in a suitable
manner. One of the main challenges of predictive services is to gain the trust of
users. The more predictions deviate from the real world and seem unreliable, the
more will they lose trust in a prediction. To foster credibility, calculations and
results have to be made transparent for users to make sure that they understand
the reasoning behind the prediction.
4.3.4 Implications
Based on the results of our diary study, focus groups, and design workshops,
we developed implications for the usage and development of future predictive
simulation services.
Wording and Understanding
The wording plays an important role for future services. It has to be taken into
account that the term simulation is a very ambiguous term and can be understood
in very different ways. The general public is not necessarily aware of what a
simulation is and how it works and might think that it is a 3D visualization. We
assume that using more specific words such as forecast instead of predictive simu-
lation could help non-experts to better understand the premises and dependencies
of new services. If the general term is needed as an umbrella term, a very detailed
explanation of the term should be given.
4.3 Non-Experts 57
Figure 4.4: Predictive simulation service product flow, highlighting the criti-
cal user experience features that need to be offered by a predictive simulation
service. The diagram shows the process of interaction with the application.
Grey lines show input of either an expert, a user, or a device.
Functional Requirements
Based on the results of the design workshops, we created a set of critical features
and a product flow depicted in Figure 4.4. This flow mainly applies for health-
related applications, but could possibly be adapted to other application areas. In
the following, we outline the four most important aspects.
Sources of Parameter Input. Parameters can be introduced by different sources,
either manually or automatically. Most participants in our focus groups stated a
clear preference for automatic input as manual input is perceived to be cumber-
some. Nevertheless, concurrent information from multiple sources that cannot be
tracked automatically still has to be entered manually by the user or a specific ex-
pert. The amount of information that has to be entered manually should however
be as little as possible or optional.
Illustrative Visualizations. One critical feature is illustrative visualizations.
Users prefer visualizations to plain numbers as this makes the results of a simula-
tion more real for them. In the health context, visualizations might also be more
motivating for users.
Short-term and Long-term. Although participants mainly already used short-
term predictive services, they were interested in using long-term predictive ser-
vices as well. Complex services may offer both short and long-term services to
58 4 Understanding Simulation Users
support the long-term predictions with short-term aims or goals that lead to the
long-term prediction.
Personal and Contextual Intelligence. One main feature we identified is the in-
tegration of personal and contextual information in predictive services. Especially
for short-term predictions, it can be crucial to know the location or environment
of a user. Personal and contextual information builds legitimacy for the predictive
simulation from the user’s perspective.
Non-Functional Requirements
We mainly identified two prerequisites that have to be taken into account when
developing predictive simulation services: flexibility and transparency.
Flexibility. First of all, the results from the diary study and the focus groups
showed that people already use a whole distinct set of predictive simulations in
everyday life and additionally have very individual and specialized needs. A new
service has to take into account these individual needs and be able to adapt to
different situations and uses. Additionally, it is very important to understand the
usage pattern that will be created by a service. Will a service be used regularly as
part of a routine, intensively during only a short period of time, or just in very
extraordinary situations?
Transparency. Participants in our design workshops stated that a simulation has
to be reliable to make them use a service more often. Reliability and increased
trust in a service can be achieved by transparency. Making the assumptions of
a service and the input data of a simulation transparent for users allows them to
better understand how predictions are made and whether they are trustworthy.
Uncertainty in the generated results should also not be hidden from users.
Promising Application Areas and Contexts
We identified health, finances, and education as promising application areas.
Participants had plenty of distinct ideas in these areas and seemed to be eager to
use predictive simulations. We additionally identified two promising contexts.
Forecast Comparison. One promising context could be to allow participants to
compare different predictive simulations that use different models. As participants
are aware of the uncertainty of predictions, they may not trust a single source and,
like in our diary study, use multiple sources. An easy service to compare different
predictions may increase their confidence and trust.
4.4 Insights for Developing an End-User Simulation Tool 59
Forecast Combination. The second promising context could be to combine
different services such as a weather forecast, traffic jam forecast, and prediction
of public transportation arrival to support more complex decisions that cannot be
solved by a single prediction. Complex services could offer different sources and
different predictions for users to make them select their favorite combinations.
4.4 Insights for Developing an End-User
Simulation Tool
In this chapter, we analyzed the current usage of simulations by simulation experts
and the general public. We conducted several smaller research probes to gain
insights into key requirements and promising application areas.
We found that analyzing expert usage provided a deeper understanding on how
simulations are currently used and some common pitfalls of simulation tools.
To make simulations understandable for the general public, mathematics in the
models have to be hidden. We envision a web-based simulation tool that sup-
ports different use cases and minimizes programming knowledge by introducing
different levels of abstraction.
Non-experts had plenty of ideas on how they could use simulations in their every-
day life. We identified the input of parameters and illustrative visualizations as
two key aspects that need to be carefully explored. At the same time, participants
expected a tool to be flexible and transparent. To achieve these two non-functional
requirements in combination with the functional requirements, novel input and
output methods are needed. The input and output methods have to be flexible and
embrace uncertainty. When uncertainty is taken into account on the input and
output level, transparency can be reached much more easily. We therefore identify
uncertainty as one of the main challenges in developing an end-user simulation
tool.
In the next part of this thesis, we systematically explore the occurrence and the
handling of uncertainty in interactive systems. We aim to identify the sources of
uncertainty in interactive systems and design and evaluate novel input and output
methods that foster transparency on all levels of a system. These methods can be
used as a starting point to develop an end-user simulation tool.
60 4 Understanding Simulation Users
III
UNCERTAINTY IN
INTERACTIVE SYSTEMS

Chapter5
Sources of Uncertainty
As related work has shown, uncertainty is an important aspect in different re-
search areas. It has also recently been gaining attention in HCI as more and
more interactive systems deal with uncertainty. Psychological research shows
that displaying the uncertainty in the results has a positive effect on humans as
decision-making improves. Nevertheless, handling uncertainty in interactive sys-
tems can only be done if HCI researchers are aware of the sources of uncertainty.
In the best case, the uncertainty introduced in different steps has to be quantified
and respected when processing data to achieve a reliable outcome. To identify
the sources of uncertainty in interactive systems, we build upon the General
Interaction Framework introduced by Dix [2009]. In the following, we introduce
the General Interaction Framework and, based on the components and translations
of the framework, identify potential sources of uncertainty in interactive systems.
We further describe more concrete aspects which introduce uncertainty by giving
examples and short outlooks on how these uncertainties can become quantifiable.
This chapter is partly based on the following publication:
• M. Greis, H. Schuff, M. Kleiner, N. Henze, and A. Schmidt. Input
Controls for Entering Uncertain Data. In Proceedings of the ACM on
Human-Computer Interaction. 1(1):1–17, jun 2017.
64 5 Sources of Uncertainty
Figure 5.1: The General Interaction Framework by Dix [2009] describes an
interactive system by specifying its four major components (system, user, in-
put, output) and the four translations between them (articulation, performance,
presentation, observation).
5.1 The General Interaction Framework
Following the definition of the General Interaction Framework by Dix [2009], an
interactive system consists of four major components: system, user, input, and
output. The input and the output form the interface which separates user and
system. The components are connected by four translations:
1. The user has to articulate his/her goals through the input,
2. The system has to interpret the input values and perform an action,
3. The system has to transform the result of the action to the output, and
4. The user observes the output.
5.2 Enhancing the General Interaction Framework 65
5.2 Enhancing the General Interaction
Framework
All components and translations in the General Interaction Framework can be
affected by uncertainty. Based on related work, we identified how uncertainty
plays a role for these components and translations focusing on the concrete
sources of uncertainty that need to be quantified or dealt with when developing
interactive systems. Finally, we enhanced the General Interaction Framework by
adding the sources of uncertainties. This helps to develop interactive systems that
are aware and cope with the introduced uncertainty in a way that better suits the
user and supports decision-making.
5.2.1 User
One potential source of uncertainty is the user. There are three main user charac-
teristics that contribute to uncertainty: the educational background, the character
traits, and the culture of the user.
The educational background might impact how the user interacts with a system.
Missing domain knowledge, or limited mathematical or statistical knowledge
could complicate a user’s understanding of the concept of uncertainty and proba-
bilities as whole. Sacha et al. [2016] outline what knowledge could be missing
and the influence of the experience level of a user. For example, a user might
not understand or be aware that an offered service is based on machine-learning
or sensor measurements that introduce uncertainty, and just interact with the
system as if it does not contain any uncertainty. Information about the educational
background of a user could be retrieved by usage patterns, metadata, and social
media accounts. However, calculating its influence might be difficult.
The character traits of a user may also introduce uncertainty. Risk tolerance for
example influences usage behavior. Users who are less risk-tolerant might not use
a system that does not seem trustworthy to them. They might also try different
inputs or systems to judge reliability. In general, they probably use the system
more carefully or less often. Character traits of a user could also be retrieved by
usage patterns, metadata, and social media preferences. Systems could then be
adapted to match the character traits of the user.
Besides character traits, the culture of a user introduces uncertainty [Bonneau
et al., 2014] as different cultures have different preferences and values.
66 5 Sources of Uncertainty
5.2.2 Articulation
During the articulation, sources of uncertainty might be a lack of knowledge,
imprecise measurements, or limited understanding which complicate the input of
data.
The user might not know the input and therefore a lack of knowledge can introduce
uncertainty. An example is a library search interface where the user wants to
search for an author. A user who does not know how to spell the name of the
author might consequently misspell it. Another example would be a user who
wants to track calorie intake guessing the weight of the food without measuring
it with a scale. The user might be aware or even not be aware of this lack of
knowledge. If users are aware of their lack of knowledge, they could communicate
this lack of knowledge to the system.
The user might also rely on imprecise measurements to articulate the input. An
example would be a user calculating calorie expenditure based on the number
of steps shown by a step tracker. Step trackers, however, vary in their reliability
and might be far off the true value, especially if mobile phone applications are
used [Guo et al., 2013]. To quantify the uncertainty, the user could communicate
details of the data sources to the system. The system might then be able to
determine the reliability of the data source. Boukhelifa and Duke [2009] as well
refer to imprecision, for example by using a global navigation satellite system
(GNSS) such as the Global Positioning System (GPS), which can act as a source
of uncertainty.
Another problem could be a limited understanding of the input methods. The
user might not correctly understand the required input or the consequences of
this input. For example, an interfaces might ask the user to enter a distance in
meters, but the user enters the distance in kilometers. The user might also enter a
completely different value or make a spelling mistake, which leads to incorrectly
entered data [Boukhelifa and Duke, 2009]. A validation of the input could help to
detect such obvious mistakes and different strategies could be taken to ask a user
to better specify the input if the system is unsure about the understanding of the
user.
5.2.3 Input
For the input, uncertainty is mainly introduced due to two factors: accuracy and
limited degrees of freedom (DoF).
5.2 Enhancing the General Interaction Framework 67
The input could be uncertain due to limited accuracy. Sliders and touch screens,
for example, only have fixed numbers of pixels and therefore a certain resolution.
This could make it difficult to express an accurate input. Schwarz et al. [2010]
created a framework for robust and flexible handling of inputs with uncertainty
introduced through natural interaction techniques.
The input could also have limited DoF [Pang et al., 1997]. A text field, for
example, only allows one to enter text while a number field only allows one to
enter numbers: they cannot accept both text and numbers. Input fields that allow
different values, for example both the deterministic value and the uncertainty
of the input, could help to increase the DoF so that a system could quantify the
uncertainty in the input.
5.2.4 Performance
For the performance, there are mainly two sources of uncertainty: transformations
and imprecise recognition.
The translation from the input to the system also includes several sources of
uncertainty, especially transformations [Pang et al., 1997]. An example is a
system that expects input in one currency and then transforms it into another
currency. The currency conversion rate could be outdated if an old stored version
of it is used, or the value could be rounded.
Another source of uncertainty is imprecise recognition. When a user is performing
gesture or voice input, the system might recognize the wrong input command
due to imprecise recognition. Schwarz et al. propose a framework for handling
and dealing with such uncertainties [Schwarz et al., 2010]. In cases of user input
ambiguity, the system could, for example, give feedback and proactively ask the
user to further specify the input.
5.2.5 System
The system itself is also a source of uncertainty. The main two factors are model
uncertainty and algorithmic uncertainty which are always inherently included.
Quantifying these uncertainties is a large research field that is gaining attention in
different research areas.
68 5 Sources of Uncertainty
The system can include model uncertainty [Pang et al., 1997; Wallentin and Car,
2013]. The source of this uncertainty is the real-world model behind the system.
Models of the world deliberately focus on several aspects whilst others are left
out as not the whole world can be modeled.
Additionally, algorithmic uncertainty [Pang et al., 1997] might be added through
the concrete implementation of the system. The choice of the algorithm probably
has an influence on the calculation (e.g. by using interpolation or extrapolation)
as well as the limited precision of computers which could produce overflow or
rounding errors.
5.2.6 Presentation
For the presentation the main source of uncertainty is transformations, which face
the same problems as when transforming the input; the transformations could be
outdated or values could be rounded.
5.2.7 Output
The output has the same potential sources of uncertainty as the input as similar
problems arise around the two factors accuracy and limited DoF.
The output could be uncertain due to limited accuracy of the output methods such
as when the output method can only be displayed in a certain resolution by the
output medium [Gershon, 1998]. A bar chart displayed on an LED display with
120 LEDS, for example, will be less fine-granular than on a big 4K monitor.
Additionally, the output could allow for limited DoF by providing less DoF that
either are calculated by the system or wished for by the user. For example, the
output might only allow a standard bar chart without any error bars showing. The
system, however, could easily calculate the data needed for error bars internally,
and the user would like to see these to get a better impression of the output.
5.2.8 Observation
On the observation phase, uncertainty may arise regarding the understanding of
the output methods or even the misjudgement of the presented information.
5.3 Implications for HCI Research 69
The understanding of the output methods might be a source of uncertainty if the
user does not understand the representation or the presented data. Gershon [1998]
names a poor choice of colors or information overload as two examples. Another
example is a user looking at a weather forecast interpreting a 50% chance of rain
to mean that it will rain half the day instead of that it rains on 50% of the days
that are similar to the forecasted day [Gigerenzer et al., 2005]. This could be
coupled to the educational knowledge of the user. Careful evaluation of output
methods and additional explanations on what the output means could help to
reduce misunderstanding.
Although the user might understand the presentation, amisjudgement of the data is
possible as well. The user might misinterpret how high the presented uncertainty
is and come to the wrong conclusion. Additionally, users potentially follow their
own judgements based on prior experience even if they get the statistically best
answer presented as decision advice if they do not know about the uncertainty
[Joslyn and LeClerc, 2012]. If, for example, the weather forecast warns that a
huge thunderstorm is coming, a listener might not take this too seriously if it
is several kilometers away, or the listener might use prior experience of earlier
warnings not ending up in a huge thunderstorm to discount this one and follow its
advice.
5.3 Implications for HCI Research
Based on the General Interaction Framework, we identified a considerable number
of sources of uncertainty in every component of an interactive system. Interactive
systems should take these sources into account to better quantify and communicate
uncertainty. Figure 5.2 shows the enhanced version of the General Interaction
Framework including all sources identified in this chapter. For the field of HCI,
we see a need to focus on the articulation and input as well as on the output
and observation of uncertainty. This includes developing new techniques for
quantifying the uncertainty in user input to increase the degrees of freedom, and
developing new techniques for the communication of uncertain data to improve
understanding and reduce misjudgments. In both steps, user evaluations are
important to verify that new methods are understandable and easy to use. In the
following chapters, we therefore present explorations for
1. the articulation and input of uncertain data (see Chapter 6)
2. the communication and output of uncertain data (see Chapter 7), and
70 5 Sources of Uncertainty
3. the interpretation of uncertain data (see Chapter 8).
The explorations add to the body of knowledge on how to design interactive
systems that are aware of uncertainty and able to quantify and communicate
uncertain data in an easy and usable fashion.
5.3 Implications for HCI Research 71
Fi
gu
re
5.
2:
Th
e
en
ha
nc
ed
ve
rs
io
n
of
th
e
G
en
er
al
In
te
ra
ct
io
n
Fr
am
ew
or
k
by
D
ix
[2
00
9]
in
cl
ud
in
g
po
te
nt
ia
ls
ou
rc
es
of
un
ce
rta
in
ty
in
in
te
ra
ct
iv
e
sy
st
em
s.
72 5 Sources of Uncertainty
Chapter6
Input Methods
This chapter describes our exploration of the design space of input methods
for uncertain data by conducting four individual research probes. Three probes
explore how to actively support users in entering uncertain data. We examine how
to enhance common input fields, how to create specialized input controls based
on sliders, and whether tangible interfaces are suitable for uncertain input. We
additionally present a research probe that examines whether physiological sensing
can be used to implicitly measure how uncertain users are. All four research
probes include an evaluation in the form of a user study.
The main goal of the research presented in this chapter is to understand how
users can be supported in entering uncertainty and whether this could help to
make uncertainty in the user input quantifiable. We used different sets of input
methods and input modalities to gain an understanding of the feasibility of these
approaches. Additionally, we wanted to learn about users’ opinions of such input
methods.
74 6 Input Methods
Parts of this chapter are based on the following publication:
• M. Greis, H. Schuff, M. Kleiner, N. Henze, and A. Schmidt. Input
Controls for Entering Uncertain Data. In Proceedings of the ACM on
Human-Computer Interaction. 1(1):1–17, jun 2017.
Parts of this chapter are also planned to be published as follows:
• M. Greis, H. Schuff, R. Kettner, P. Franczak, and A. Schmidt. Explicit
Input of Uncertainty: Enhancing Standard Input Controls.
• M. Greis, J. Karolus, H. Schuff, P. Woz´niak, and N. Henze. Detecting
Uncertain Input Using Physiological and Behavioral Measurements.
• M. Greis, H. Kim, A. Schmidt, and C. Coutrix. SplitSlider: Exploring
Tangible Interfaces for Communicating Input Uncertainty.
6.1 Enhancing Input Controls
Current forms support standard input fields such as text fields and radio buttons.
These input controls allow a fixed input of a known answer. However, users may
not always be sure of their input; furthermore they may not have the possibility to
communicate this to the computer. For example, an interface that tracks users’
eating habits provides a form to enter the weight and composition of the food
they have eaten. Most users will probably guess the weight of their food, but
as a current interface would only offer number fields for the input they cannot
communicate the amount of uncertainty included in their input.
The main goal of the work presented in this section is to understand how tradi-
tional input fields can be enhanced to offer users the possibility to enter uncertain
data. We mainly aimed to understand the preference of potential users for dif-
ferent designs. Therefore, we first of all identified and classified common input
fields. Based on the taxonomy, we developed designs and prototypes that we
evaluated with potential users by conducting a pre-study and an evaluation in the
lab.
6.1 Enhancing Input Controls 75
In the following, we present our taxonomy of the input fields; our first sketches
and the results of the pre-study; the prototype implementation and lab evaluation;
and the results of the in-the-wild evaluation.
6.1.1 Common Input Controls
To enhance existing input controls, we first looked at different developer guides
to identify common input controls. We then developed a taxonomy for these input
controls based on the input they accept. For each cluster of the taxonomy, we
developed prototypes for entering uncertain data.
Identification of Input Controls
To identify common input fields, we looked at the documentations and developer
guides of Windows1, HTML2, Android3, and iOS4. Table 6.1.1 shows an overview
of the input controls that are commonly used. All of the development guides,
e.g., contain buttons such as action buttons, radio buttons, and checkboxes. Less
common are specific controls such as list boxes or ratings. In the following, we
shortly describe the input controls.
Textfield. A general textfield allows users to input any kind of string. However,
specific versions of these textfields exist. Password fields, for example, do not
show the input in the field but instead show placeholders and e-mail fields only
allow the user to enter valid e-mail addresses, which corresponds to a string with
a specific pattern. Textfields could either be one line or span multiple lines.
Button. An action button is a button that activates a specific function if clicked
(e.g. save an entry, undo, etc.). A toggle/switch button allows toggling between
two states (e.g. on and off). Radio buttons instead allow users to change between
different states. They are summarized in groups and only one of them can be
selected per group . Normally they are depicted as a circle. Checkboxes are quite
similar to radio buttons, but they do not need to be grouped and even if they are
1 Windows developer guidelines:
https://msdn.microsoft.com/en-us/library/windows/desktop/dn742399(v=vs.85).aspx
2 HTML form elements: https://www.w3schools.com/html/html_form_elements.asp, HTML input
elements: https://www.w3schools.com/html/html_form_input_types.asp
3 Android developer API guides: https://developer.android.com/guide/topics/ui/controls.html,
https://developer.android.com/reference/android/widget/package-summary.html
4 iOS human interface guidelines:
https://developer.apple.com/ios/human-interface-guidelines/ui-controls/buttons/
76 6 Input Methods
Table 6.1: Overview of input controls explained in developer guidelines.
Windows HTML Android iOS
Textfield General
Password
Email
Search
Telephone Number
URL
Button Action Button
Toggle/Switch Button
Radio Button
Check Box
Number Picker Stepper/Spin Control
Slider -
Dropdown -
Picker Date
Time
Color
File
List Box -
Rating -
grouped, multiple checkboxes can be selected. A single checkbox could work as
a toggle button. Checkboxes are normally depicted as squares.
Number Picker. A number picker, also called stepper or spin control, allows
users to only enter numeric values. Sometimes these values could be restricted to
a certain range.
Slider. A slider allows a user to pick a point on a continuous scale. This could
either be a numeric value or any other continuous scale (e.g. light to dark).
Dropdown. A dropdown is a folded list. The user can open the list and select
one of the options to be shown in a kind of text field belonging to the dropdown.
Picker. Different UIs support pickers; such as a date picker, color picker, or file
picker. In general, these could as well be seen as dropdowns with more options
or nicely formatted lists to select from.
6.1 Enhancing Input Controls 77
List Box. The Microsoft guidelines also contained a list box. The list box allows
a user to pick one or multiple items of a list. In contrast to a dropdown, the list is
not folded but always visible.
Rating The Android developer guide contained a rating as an input control. This
could, for example be a star rating with five stars.
To enhance these input elements to support users at entering uncertainty, we
decided to first build a taxonomy of these input controls based on what input
they allow. Controls accepting the same input could be combined with the same
methods for entering uncertainty.
Taxonomy of Input Controls
Depending on the possible input options that the input controls allow, we divided
the input controls into four clusters. To build the clusters, three researchers sorted
the input controls according to their possible input in a joint session. The sorting
continued until the agreement of all researchers was reached. The clusters are not
necessarily exclusive we however added input controls to the cluster which they
fit best.
Cluster 1: Selection. The user can select m pre-defined elements of the finite set
M = {E1, E2, . . . , En} with n  1. The user is able to select the subset Ms ✓M.
We divided this cluster into two subclusters: Subcluster a) only contains input
controls that allow the user to exactly select one element (|Ms|= 1); Subcluster
b) contains input controls that allow the user to select at least 1, but also up to
|M|= n elements (1 |Mt | n).
a) Selection of one element: |Ms|= 1
Examples: Radio buttons, Dropdown, Listbox (single selection), Color Picker,
File Picker
b) Selection of one or multiple elements: 1 |Ms| n
Examples: Check boxes, Listbox (multi selection)
Cluster 2: Interval Input. The user can pick a value from the interval [a,b].
We divided this cluster into two subclusters: Subcluster a) only contains input
controls that support to select a finite interval [a,b],a< •,b< •; Subcluster
b) contains input controls that theoretically allow to enter an infinite interval
[a,b],a •,b •. Practically this interval is not infinite due to a definition of a
lower and upper bound or technical constraints.
a) Finite interval: [a,b],a< •,b< •
Examples: Slider, Rating, Time Picker
78 6 Input Methods
b) Infinite interval: [a,b],a •,b •
Examples: Date Picker, Number Picker
Cluster 3: Character Input. The user enters a sequence of characters. We
divide this cluster into two subclusters: Subcluster a) contains input controls that
in general allow to enter an arbitrary sequence of characters (.⇤ or .+); Subcluster
b) contains input controls that restrict the sequence of characters by applying
regular expressions, such as [a  z]{5,14}.
a) No restrictions: .⇤ or .+
Examples: General Textfield, Search Textfield
b) Pattern restrictions: e.g. [a  z]{5,14}
Examples: Password Textfield, Email Textfield, Telephone Number Textfield,
URL Textfield
Cluster 4: Action Trigger. The user triggers an action, which can be a primary
or a secondary action. We therefore divide this cluster into two subclusters:
Subcluster a) contains all input controls that allow to trigger a primary action,
e.g., “Submit”; Subcluster b) contains all input controls that trigger a secondary
action, e.g. toggling a state between “ON” and “OFF”.
a) Trigger a primary action: e.g. “Submit”
Example: Action Button
b) Trigger a secondary action: e.g. Toggle state between “ON” and “OFF”
Example: Toggle/Switch Button
As cluster 4 does not really correspond to input, but rather action items, we
decided to exclusively focus on the first three clusters. In the following, we will
therefore refer to methods that apply for cluster 1, cluster 2, and cluster 3.
6.1.2 Methods for Entering Uncertainty
We focused on three different methods on how input uncertainty could be entered
by users: entering a textual description that explains the uncertainty, entering
additional numeric values, or by allowing multiple answers. We developed
these methods based on informal communication with potential users, and brain-
storming sessions. We decided to only focus on the explicit input of uncertainty
as functionality such as validation or autofill cannot substitute explicit methods.
6.1 Enhancing Input Controls 79
Text
The uncertainty can be specified by allowing the user to enter a textual description
of the uncertainty.
1) Textual Description. In addition to the input, the user can enter an optional
text on why and how the input is uncertain. An example could be an interface
for fitness tracking where the user can enter a statement about the uncertainty of
the input data, e.g., the user might not have used a scale to weigh the food and
therefore guesses the number of grams.
Numerical Values
The uncertainty of the input can be specified through the user by adding one or
multiple numerical values to the input. The additional single value can, e.g., be a
probability percentage. Multiple numerical values could be used to enter a range
instead of a single value.
2) Single Value. In addition to the input, the user specifies a single numerical
value which represents the uncertainty. This could be, for example, the probability
percentage of the uncertainty, in which 0% represents a completely certain
answer; 100% a very uncertain answer.
3) Range. Instead of a single value, the user could also enter a range or interval
for the input; the bigger the interval, the bigger the uncertainty. The interval
bounds would correspond to the minimum and maximum values that a user
expects for the input; the maximum interval should correspond to the values that
the input can accept.
Allow Multiple Answers
The uncertainty of the input can be specified by allowing users to give not only a
single answer, but multiple possible answers.
4) List. Instead of a single input, the user could enter multiple values. For
example, if the user is unsure whether a name is spelled with an “e” or “i”, the
user might just enter both names instead of deciding which one to enter.
5) Ranking. Instead of a single input or a list of possible values, the user could
do a ranking of the input based on probability. In the former example, the user
might be more sure that the name contains an “e”.
80 6 Input Methods
Table 6.2: Suitability of uncertainty input methods 1) to 5) for the identified
clusters of input methods: 1) textual description, 2) single value, 3) range, 4)
list, 5) ranking.
1) 2) 3) 4) 5)
Cluster 1: Selection a)
b)
Cluster 2: Interval Input a)
b)
Cluster 3: Character Input a)
b)
6.1.3 Design of Non-Functional Prototypes
Not every input control can be used in combination with every method for
entering uncertain data. As depicted in Table 6.2, a textual description and
a single value could be used with each of the input controls. Range selection,
however, only makes sense for interval data. Lists and rankings mostly correspond
to selection tasks and textual input. For these respective combinations, we
developed non-functional digital prototypes. However, we decided to further
limit the scope by only developing prototypes for the selection and the input
interval cluster. Character input is a good candidate for validation and autofill or
regular expressions related to the uncertainty input methods of lists, however, our
developed methods are not particularly suitable for character input.
We developed a set of non-functional prototypes consisting of 32 sketches. Figure
6.1 shows a subset of the developed sketches using radio buttons as the main
input control. Figure 6.1a shows a sketch where a user could only state that the
answer is uncertain, but not give details about the uncertainty. We used this as a
baseline. Figure 6.1b and Figure 6.1c show an interface that allows the user to
enter a percentage value with a number field or a slider. Figure 6.1d includes a
text field where a user could leave an optional comment about the uncertainty.
We designed similar paper prototypes for the other clusters for the following six
input controls: radio buttons, drop-down list, check boxes, list box, slider, and
number picker.
6.1 Enhancing Input Controls 81
(a) Uncertainty cannot be specified in detail,
but the uncertain nature of the answer can be
communicated.
(b) Uncertainty percentage can be entered in a
number field.
(c) Uncertainty percentage can be entered with
a slider.
(d) Information about uncertainty can be in-
cluded in a text field.
Figure 6.1: Examples of the non-functional prototypes for the discussions
with potential users. The prototypes all use radio buttons and an additional
uncertainty input method.
82 6 Input Methods
6.1.4 Selection of Promising Designs
To identify the most promising designs from our prototypes and conduct a lab
study, we conducted a pre-study in which we discussed all sketches with potential
users.
Method
We first asked participants for demographic information such as age, gender, and
background. We then focused on a few introductory questions before giving
participants an overview of the clusters and the idea of uncertain input. For all
groups of paper prototypes (with the same method of uncertainty input), we
asked the participants to describe their first impression, their reasons for liking or
disliking the prototype, and whether they would use it. Additionally, participants
were to indicate how much they liked the presented interfaces and how easy they
tought it would be to use on two five-point Likert items.
Participants
In total, we interviewed eight participants (6 male, 2 female). On average, their
age was 21.6 (SD= 2.0) years. All were students of the University of Stuttgart
and had a good general knowledge about user interfaces and input controls.
Results & Discussion
In the discussions, participants expressed that they mostly liked user interfaces
developed for cluster 1a) and cluster 2a) (see Table 6.3 for the results). Espe-
cially well perceived were the probability percentage input and the range input.
Participants did not like the simple check box to indicate uncertainty. They
also criticized more complex interfaces developed for the clusters 1b) and 2b).
We therefore decided to focus on the interfaces for clusters 1a) and 2a), which
were the most favored by the participants. More theoretical groundwork may be
necessary to find suitable methods for entering uncertainty for the other clusters.
6.1.5 Evaluation in the Lab
As a second evaluation, we conducted a user study in the lab selecting the most
promising paper prototypes from the interviews. We implemented functional
versions of these paper prototypes in a web-based study interface, which the
6.1 Enhancing Input Controls 83
Table 6.3: Results from the pre-study on how much participants liked a
specific combination of main input method for clusters (top) and uncertainty
input method (left). Likert items were converted to numbers from 1 to 5; 1
corresponds to “not at all” and 5 to “very much”.
Uncertainty Input Method Control 1a) 1b) 2a) 2b)
Baseline Check box 2.25 2.13 2.38 2.38
Textual Description Text field 3.13 2.88 2.63 2.63
Ranking Ranking list 2.63 2.63 - -
Percentage sliders 3.25 3.25 - -
Probability percentage Number field 3.88 3.00 - -
Slider 3.88 3.75 - -
Range Pair of number fields - - 3.5 3.5
Two-thumb slider - - 4.38 4.38
participants then used to answer questions while we collected quantitative and
qualitative data about their usage.
Method
We conducted a user study with in total 60 questions out of the fields of sports
and nutrition. For each question, participants had to enter the answer and their
uncertainty about the correctness of their answer. The study consisted of two
different parts.
In the first part, participants had to answer 30 multiple-choice questions. Half
used radio buttons; the other half used a drop down menu. They had to select
the correct answer out of four possibilities. For all questions, participants had
to specify their uncertainty as a percentage value between 0%-100%: 0% for
a completely certain answer; 100% for a completely uncertain answer. For
reporting their uncertainty, participants used two different user interfaces for 15
questions each: a numeric slider, and a number field. Figure A.1a shows the
numeric slider and Figure A.1b shows the number field.
In the second part of the study, participants had to answer 30 questions with
numeric answers. Half used a numerical sliders; the other half used a number
field. For all questions, they had to specify their uncertainty as a range of
numerical values in addition to the point estimate. For the question “How much
magnesium (in mg) does a 100 g mango contain?” they could provide a point
84 6 Input Methods
estimate in the range between 0mg-100mg (e.g. 20mg) and then specify an
additional range (e.g. 5mg- 30mg) to describe their uncertainty. For reporting
their uncertainty, participants used two different user interfaces for 15 questions
each: a numeric slider with two thumbs, and a pair of number fields. Figure
A.2a shows the numeric slider with two thumbs and Figure A.2b shows the two
number fields.
We used a Latin square pattern to equally distribute the assignment of the main
input methods and the ordering of the additional uncertainty input methods to
participants. The order of the questions was randomized per part.
Besides demographic data, we collected specific data per question such as the
used interface components, the answer, the reported uncertainty, and the time
needed to answer the question. Additionally, participants completed an Usability
Metric for User Experience (UMUX) questionnaire [Finstad, 2010] after each 15
questions, which corresponded to using one type of additional input control for
reporting uncertainty.
Participants
In total, we conducted the user study with 16 participants (6 female, 10 male)
with an average age of 22.9 (SD= 2.8). Participants were recruited in a university
setting, so the majority of them were undergraduate students.
Results
We used R and lme4 [Bates et al., 2015] to perform a linear mixed effects analysis.
In all models, we added subject ID and question ID as random effects and the
respective variables as fixed effects. P-values were obtained by likelihood ratio
tests of the full model with the effect in question against the model without the
effect in question. For the UMUX score, we performed linear mixed-effects
model analyses on the aligned-rank transformed data [Wobbrock et al., 2011].
Multiple-Choice Questions. For the multiple-choice questions, we did not
find any significant effect of the main input method and the uncertainty input
method on the uncertainty percentage. People entered quite similar values for the
percentage slider (M = 52.6%, SD= 31.7%) and the number field (M = 53.6%,
SD= 31.1%). We additionally did not find a significant effect of the main input
method and the uncertainty input method on the time that people needed to enter
their answer. For the slider, people on average needed 22.2 s (SD= 20.0s) and
for the number field 20.9 s (SD= 13.4s).
6.1 Enhancing Input Controls 85
(a) Provided uncertainty percentage grouped
by whether participants’ answers were right
or wrong.
(b) UMUX scores grouped by uncertainty input method and primary
input method (blue - drop down menu, green - radio buttons)
Figure 6.2: Results for the first part of the experiment where participants had
to select one value and used a slider or a number field to enter a percentage
value.
In 172 trials the multiple-choice question was answered correctly, and an-
swered wrong in 308 trials. The mean uncertainty percentage differed signifi-
cantly for correct and wrong answers according to a Welch’s two sample t-test,
t(336.78) = 3.115, p< .01 (see Figure 6.2a). For correct answers, the entered
uncertainty percentage was significantly smaller (M = 47.1%, SD= 32.3%) than
for wrong answers (M = 56.5%, SD= 30.4%).
The UMUX analysis revealed no significant main effects or interaction effects
for the UMUX score (see Figure 6.2b). The percentage slider reached a UMUX
score of 79.7 (SD= 13.1) in combination with the drop down menu and the same
UMUX score in combination with the radio buttons (M = 79.7, SD= 18.7). The
number field received a lower score in combination with the drop down menu
(M = 69.8, SD= 17.8) than with the radio buttons (M = 83.3, SD= 17.8).
Numerical Questions. For the numerical questions, we did find a significant
effect of the uncertainty input method on the size of the range the subjects
entered, p < .05. We normalized the range span to a value between 0 and 1
where 0 corresponds to the lower bound and 1 to the upper bound of the range.
People entered a significantly larger range with the two-thumb slider (M = 0.4,
SD= 0.3) than with the pair of number fields (M = 0.3, SD= 0.3). We did not
find a significant effect of the main input method and the uncertainty input method
on the time that people needed to enter their answer. For the two-thumb slider,
people on average needed 27.2 s (SD= 21.1s) and for the number fields 28.7 s
(SD= 32.0s). Nevertheless, the difference between the bounds of the range and
86 6 Input Methods
(a) Provided range span grouped by whether
participants’ answers were right or wrong.
(b) UMUX scores grouped by uncertainty input method and primary
input method (blue - numerical slider, green - number field)
Figure 6.3: Results for the second part of the experiment where participants
had to enter a numerical value and used number fields and a two-thumb slider
to enter a range for their uncertainty.
the correct answer if the answer was not in the range was similar for both controls
(M = 0.1, SD= 0.2).
In 30 trials the numerical question was answered correctly, and answered wrong
in 450 trials. In 197 of the wrong trials, the answer was in the selected range.
The range span differs significantly for correct and wrong answers according to a
Welch’s two sample t-test, t(38.51) = 4.337, p< .001 (see Figure 6.3a). For
correct answers (M = 0.2, SD= 0.2), the entered range span was significantly
smaller than for wrong answers (M = 0.3, SD= 0.3).
The UMUX analysis revealed a significant main effect for the method of uncer-
tainty input on the UMUX score, F1 = 5.71, p< .05 (see Figure 6.3b). Partici-
pants rated the number fields significantly higher (M = 60.7, SD= 5.7) than the
two-thumb slider (M = 44.8, SD= 5.9). We did not find a significant interaction
with the primary input method.
Discussion
The results indicate that for the percentage input, both input methods work well.
They do not differ significantly in the entered percentage, input time, or UMUX.
Both methods received on average a good UMUX score (according to the System
Usability Scale (SUS) values by Bangor et al. [2008]). The primary input method
seems to be not tightly connected to the uncertainty input, however the results
6.1 Enhancing Input Controls 87
indicate that there could be a difference (at least for the number field), which
should be evaluated with further studies.
For the second part, the results differ slightly from the first part. Again, there is
no significant difference in the input time of the two uncertainty input methods,
however, they differ in the selected range span. We assume that the slider is not
as precise as the number field and that people are biased to enter even values
into the number field while they do not care that much about uneven numbers
when dragging the slider. Both controls therefore could be helpful depending on
whether precision or debiasing is the aim of an interface. However, the slider
overall received a poor UMUX rating, while the number fields still reached an
acceptable value. The slider might have been too complicated and decoupled
from the task. To improve the interfaces, we suggest to develop interfaces that
allow the input of the deterministic value and the uncertainty value at the same
time. The acceptance of the controls might also change if users are not forced to
enter uncertainty but can decide on their own whether to enter uncertainty or not.
The additional controls in this case make the user interface more complex and
difficult to understand.
6.1.6 Implications
This part of work showed that different options exist to enhance common input
fields of different categories to allow for uncertain input. However, people
rather prefer quantitative methods for entering uncertainty, as qualitative feedback
requires a huge effort. Additionally, people preferred to have input methods
for uncertainty that they are already familiar with such as slider and number
fields. More complex prototypes did not receive good feedback. We therefore
concluded that it is easier to develop uncertainty input methods for selection tasks
or numerical input.
The percentage input for uncertainty was well perceived and both developed user
interfaces received a good UMUX score in the study. This input can also be easily
used by systems to calculate results with uncertainty and definitely has potential
to be used in future interfaces. By providing a default selection of 0%, users
would know that they do not even have to bother interacting with the additional
control but could use the standard control only.
Although our range input methods did not score very well in the UMUX, we
argue that range input could be helpful in the future. We assume that uncertainty
input methods allowing range input need to be more connected to the actual value
88 6 Input Methods
input to not require too much additional input. This could probably be achieved
by designing new input controls specifically made for uncertain input.
6.2 Probability Distribution Sliders
We took a second step to explore input controls for uncertain data. In contrast
to the controls presented in Section 6.1, we used a mathematical foundation and
concentrated on feedback and transparency of how the input would be interpreted
by a system. Instead of comparing completely different designs, we decided to
focus on one input control and compare different degrees of freedom.
Sliders are flexible input controls that are currently used for a variety of tasks.
Several of their properties make them an ideal input control for uncertain data
achieving transparency. Sliders are mostly known as visual analogue scales, used
in clinical trials and research [Marsh-Richard et al., 2009]. Their usage has been
studied widely, e.g., revealing that tick marks introduce a bias [Matejka et al.,
2016]. But sliders are also known as selection controls for data exploration, for
example the alphaslider [Osada et al., 1993] allows a user to select words, phrases,
or names from textual lists. Sliders can also display the distribution of existing
data to filter by showing a density plot in the slider bar [Eick, 1994]. A similar
approach is suggested by Willett et al. [2007]. Their scented widgets incorporate
visual elements that support the selection and exploration of data. Lasram et al.
[2012] also enhanced the slider bar with additional visualizations to show the
effect of the input on an image. These combinations of sliders with visualizations
are especially promising for the input of uncertain data.
We therefore decided to use sliders as the basis for our work and develop slider
controls that offer different degrees of freedom. The main goal of this part of
work was to understand how users interact with enhanced slider controls that
allow manipulation of properties of a probability distribution function and how
they handle different degrees of freedom. To evaluate our designs, we conducted
an online survey and a controlled user study in the lab.
6.2.1 Design Process
Common input controls for numerical input allow users to enter a single value,
which could either be the mode, the median, or the mean (expected value) for
6.2 Probability Distribution Sliders 89
Table 6.4: Deriving levels with varying degrees of freedom for entering a
probability distribution function.
SD Skew Kurtosis
Level 0 not included not included not included
Level 1 fixed fixed fixed
Level 2 adjustable fixed fixed
Level 3 adjustable adjustable fixed
Level 4 adjustable adjustable adjustable
probabilistic input. We based our designs on different degrees of freedom that
can be derived from the properties of a probability distribution function: mode,
standard deviation, skew, and kurtosis. Each of these properties adds additional
flexibility and could be either not included, fixed, or adjustable by the user.
In total, we derived five levels with rising flexibility and complexity listed in
Table 6.4.
As a baseline for our designs, we used a standard web slider. Based on the four
levels, we developed four additional input controls (ICs). The number of an IC
corresponds to the levels of Table 6.4. The standard web slider corresponds to
IC0.
IC1 (see Figure 6.4a) allows the user to move a fixed selection of the slider
bar. Standard deviation, skew, and kurtosis of the probability distribution
function are fixed.
IC2 (see Figure 6.4b) allows the user to drag at the two ends of the selection to
create a range with flexible size. This influences the standard deviation of
the probability distribution function.
IC3 (see Figure 6.4c) allows the same interaction as IC2, but additionally offers
to specify the mode. By specifying the mode, users can influence the skew
of the distribution to create asymmetric distributions.
IC4 (see Figure 6.4d) allows the same interaction as IC3, but additionally offers
two more values of the distribution (half as high as the mode), which can
be specified. This influences the kurtosis of the probability distribution
function.
To achieve transparency of how the input will be interpreted by the system, we
added the same three supportive visual elements to every IC: (1) A gradient plot,
90 6 Input Methods
(a) Fixed Range Slider (b) Flexible Range Slider
(c) Flexible Range Best Estimate Slider (d) Advanced Flexible Range Best Estimate Slider
Figure 6.4: Four slider controls each enabling users to enter a probability
distribution function. The sliders have ascending degrees of freedom (left
to right). The interaction cues shown in the pictures are displayed when
hovering over an interactive element (drag handle) of the slider control.
(2) a gradient height legend for the gradient plot, and (3) a plot of a probability
distribution function. In addition to these visual elements supporting transparency,
we added interaction cues and tooltips. Before a user interacts with a control,
the tooltips will be displayed to help a user understand the control. During the
interaction, the interaction cues are displayed when hovering over the interactive
elements of the IC.
6.2 Probability Distribution Sliders 91
6.2.2 Online Evaluation
As a first evaluation for the slider controls, we conducted an online survey, in
which we collected subjective feedback according to perceived effectiveness,
efficiency, ease of use, satisfactions, and learnability of the controls.
Method
We first asked participants of the online survey to provide demographic informa-
tion and assess their knowledge about stochastic, statistics, probability theory,
and probability distributions. We then presented the sliders (IC0 to IC4) in ran-
domized order to reduce sequence and learning effects. Each page contained a
short description of the input controls and an exemplary task. Based on a table
that showed how often “Sam Sample” used his car over 36 months, we asked
the following question: “How many times each month does Sam Sample use his
car to go to work?”. The participants were encouraged to try out the control and
indicate their level of agreement on the following five-point Likert type items:
Effectiveness: “I am confident that I am able to correctly enter data with this
input method.”
Efficiency: “I was able to quickly enter data using this input method.”
Ease of use: “It was simple to use this input method.”
Satisfaction: “I liked using this input method.”
Learnability: “I could use this input method intuitively (without reading the
description)” up to “I doubt I will ever be able to confidently use this input
method (even after training).”
We additionally offered a textfield for positive and negative remarks, reasons
for their judgments, further suggestions, questions, or comments. In the end,
participants completed two rankings, one according to how much they liked the
ICs and one on how useful they perceived the ICs to be.
Participants
We recruited prospective participants via social media and a list of volunteers
maintained by the department. In total, 75 participants (34 female, 40 male, 1
preferred not to say) completed the online survey, but two male participants had
92 6 Input Methods
IC0 IC1 IC2 IC3 IC4
0
20
40
60
#p
ar
tic
ip
an
ts
(a) User ratings for effectiveness
IC0 IC1 IC2 IC3 IC4
0
20
40
60
#p
ar
tic
ip
an
ts
(b) User ratings for efficiency
IC0 IC1 IC2 IC3 IC4
0
20
40
60
#p
ar
tic
ip
an
ts
(c) User ratings for ease of use
IC0 IC1 IC2 IC3 IC4
0
20
40
60
#p
ar
tic
ip
an
ts
(d) User ratings for satisfaction
IC0 IC1 IC2 IC3 IC4
0
20
40
60
#p
ar
tic
ip
an
ts totally agree
agree
neutral
disagree
totally disagree
(e) User ratings for learnability
Figure 6.5: Results of the online survey showing the agreements of 73
participants on a five-point Likert scale about the input controls (IC0 to IC4).
The exact formulation of the statements is provided in Section 6.2.2.
to be excluded from the analysis. We analyzed the data from 73 participants with
an average age of 26.0 years (SD= 6.0). More than 90 % had a high school or
higher degree. They additionally reported to on average having some knowledge
about stochastics and statistics.
Results
All answers of the participants are depicted in Figure 6.5. For the analysis of
the results, all Likert scale ratings were converted to the numbers 1 (totally
disagree) to 5 (totally agree). The learning item was converted to the same scale.
For each statement, we conducted a Friedman test with a significance level of
a = 0.05. As post hoc analysis, we conducted Wilcoxon signed-rank tests with
an applied Bonferroni correction, resulting in a significance level of p < .005
for each statement. We only report significant results. We additionally report
qualitative feedback.
Effectiveness. We found a significant difference in terms of perceived confidence
to be able to correctly enter data, c2(4) = 23.94, p< .001. Participants rated IC1
and IC4 significantly worse than IC0 and IC2.
6.2 Probability Distribution Sliders 93
Efficiency. We found a significant difference in terms of perceived ability to
quickly enter data, c2(4) = 61.56, p< .001. IC4 was rated significantly worse
than all other input controls, and IC3 was rated significantly worse than IC0, IC1,
and IC2.
Ease of Use. We found a significant difference in terms of perceived ease of use,
c2(4) = 63.85, p< .001. Participants rated IC3 and IC4 significantly worse than
IC0, IC1, and IC2
Satisfaction. We found a significant difference in terms of perceived satisfaction,
c2(4) = 16.48, p= .002. IC4 was rated significantly worse than IC2 and IC3.
Learnability. We found a significant difference in terms of perceived learnability,
c2(4) = 73.74, p < .001. Participants thought that IC0 would be significantly
easier to learn than all other input controls. IC3 and IC4 were also rated worse
than IC1 and IC2.
Ranking - Likeability. We found a significant difference in the ranking of the
input controls for likeability , c2(4) = 31.46, p < .001. Participants ranked
IC0, IC2, and IC3 significantly better than IC4. Additionally IC2 was ranked
significantly better than IC1.
Ranking - Usefulness. We also found a significant difference in the ranking of
the input controls for usefulness , c2(4) = 62.18, p < .001. Participants found
IC2, IC3, and IC4 significantly more useful than IC0 and IC1.
Qualitative Feedback. We collected qualitative feedback for all input controls
and the ranking. The feedback about the rankings gave most insights into why
participants preferred some controls over others. One participant summarized
the ranking in his comments by highlighting that he “[...] liked [IC3] because
you can adjust some features, but it’s still relatively quick. Method [IC2] and
[IC1] give less possibilities. But in [IC4] it’s too much you have to enter." Most
participants preferred IC3 as a good compromise: “The more power the methods
hold, the more complex and annoying the interaction got. Number [IC3] was a
good compromise. Easy sliding and the possibility to change the width.”
Discussion
According to the ranking results, participants preferred IC2 and IC3 over the
other methods, and IC0 over IC2 on all scales other than satisfaction. We assume
that IC2 outperformed IC0 on this scale as the participants realized that IC0 was
not suitable to provide an answer for the example task. Interestingly, participants
also rated IC2 better than IC1 for all given statements although IC1 was easier
94 6 Input Methods
to use and provided fewer interactive items. One reason for this preference was
that participants did not like the fixed slider bar as they did not understand how
the range was chosen. This resulted in IC1 having the worst satisfaction rating
after IC4. To raise the satisfaction for IC1, it could help to provide a detailed
explanation on why the chosen range was appropriate for a given task. Of all
input methods, participants rated IC4 worst on all scales. They perceived the
control as cumbersome and difficult to handle. Although participants did not
rate IC3 high on the single scales, it was placed very high in the rankings and
participants stated in their comments that it was the best compromise between
complexity and possibilities.
6.2.3 Evaluation in the Lab
As a second evaluation, we conducted a user study in the lab. In this study, we
concentrated on the input time, additional support to understand the controls, and
how well participants provided the expected input when using the controls.
Method
We used a within-subjects design. Each participant had to solve three tasks and
complete a SUS questionnaire for each input control. To minimize sequence and
learning effects, we randomized the order of the input controls. Each participant
completed the following three tasks per input control:
1. The task was an adapted version of the exemplary question of the online
evaluation. The question was slightly modified to “What is the most likely
value. . . ?” We also prepared five different tables and randomly assigned
them to the input controls for each participant.
2. Participants had to specify how much money they spent when doing their
grocery shopping. This was deliberately chosen as a free task with no right
answer. Participants should get a feeling on how they would interact with
the input control in a real task in everyday life.
3. Participants had to specify the possible outcome of dice rolls.
The study interface was a web-based interface that contained the description
of the task, the question, the input control, a help button, and five buttons for
participants to judge their confidence in the correctness of their answer with
6.2 Probability Distribution Sliders 95
Table 6.5: Results of the user study showing the means and standard devia-
tions of all 30 participants for each of the used metrics.
IC0 IC1 IC2 IC3 IC4
M SD M SD M SD M SD M SD
Clicks per Element 1.86 1.63 2.89 2.81 1.43 0.98 1.84 1.72 2.53 1.56
Total Input Time (s) 33.2 32.4 35.5 30.1 31.4 21.8 45.6 33.7 63.3 32.5
Help Time (s) 8.90 26.67 5.71 17.24 5.71 17.24 7.18 27.49 10.99 27.81
Perceived Correctness 3.44 1.06 3.49 1.11 3.82 1.03 3.91 0.83 3.76 0.87
Deviation of Answers T1 2.37 1.90 2.63 1.97 2.59 2.08 2.64 2.14 2.39 1.82
Deviation of Answers T3 53.17 85.00 42.17 36.31 29.44 54.47 40.56 62.91 30.78 46.9
SUS score 65.25 20.05 65.67 16.00 70.92 17.43 72.5 16.19 47.83 21.52
five options: “My input is correct”, “My input is nearly correct”, “I’m not sure
whether my input is correct”, “I doubt that my input is correct”, and “I don’t
understand it”. We recorded all clicks on the input controls and buttons, the time
the help dialog was open, the total input time, and the input itself.
Participants
In total, we conducted the lab study with 30 participants (11 female, 19 male) with
an average age of 26.0 (SD= 7.2). We recruited participants in a university setting,
so the majority of them were undergraduate students with different subjects. As
in the online evaluation, the majority had a high school degree as their highest
educational degree, thus a similar amount of knowledge about stochastics and
statistics.
Results
For the analysis, we did not distinguish between participants with low or high
statistical knowledge as a calculation of Pearson’s r to detect possible correlations
between the statistical knowledge of participants and their performance revealed
no strong positive or negative correlations. We only found four moderately
positive correlations (0.40 < r < 0.45) in terms of how often and how long
participants consulted the help for tasks 2 and 3 of IC2 and IC4. All other
correlations were not significant ( 0.35< r< 0.35). As for the online evaluation,
we conducted Friedman tests with a significance level of a = 0.05 and Wilcoxon
signed-rank tests with an applied Bonferroni correction (p< .005) for post hoc
analysis. Table 6.5 depicts means and standard deviations for all metrics.
Efficiency. We recorded all clicks on the input controls to understand how often
participants changed their input. To calculate a comparable value, we divided the
96 6 Input Methods
total number of clicks for one input control by the number of interactive elements
(drag handles). The Friedman test showed a significant difference in how often
participants clicked on the input controls, c2(4) = 43.54, p< .001. Most clicks
per element were recorded for IC1 with 2.89 clicks, which was significantly
higher than for IC0, IC3 and IC2. IC1 was followed by IC4, which was also
clicked significantly more often than IC2 and IC3. We additionally analyzed the
total input time. The Friedman test showed a significant difference in how long
the participants interacted with the input controls, c2(4) = 98.05, p< .001. In
comparison, participants were faster using IC0, IC1, and IC2. With IC3, it took
them significantly more time to enter data than with IC0 and IC2. Entering data
with IC4 took significantly longer than with all other input controls.
Confidence. We recorded how often participants clicked the help button and how
much time the help information was opened. We did not find any significant
difference for the usage of the help button and the time. At the end of each
task, participants additionally rated their confidence of the correctness of their
input. We converted the statements to numbers from 1: “I don’t understand it",
to 5: “My input is correct". The Friedman test showed a significant difference
in the confidence, c2(4) = 16.85, p= .002. Participants were significantly more
convinced of the correctness of their input when using IC3 than when using IC0
and IC1.
Effectiveness. We calculated the absolute deviation of the answers for task 1
and task 3 by calculating the mean deviation of all interactive elements, but the
Friedman test showed no significant differences.
Usability. Each participant answered a SUS questionnaire for each input control
which was adapted by replacing the term system with the term input control. We
found a significant difference in the reported usability, c2(4) = 33.64, p< .001.
IC4 was rated significantly worse than all other input controls with a SUS score
of 47.83.
Discussion
The results indicate that IC2 was easier to use intuitively than IC0 and IC1, as
the least clicks per interactive element were recorded for IC2. Additionally,
participants were faster with IC2 than with IC0 and IC1, although IC2 consisted
of one more interactive control than the other two ICs.
We additionally experienced that help information was used surprisingly often for
IC0, the standard slider. We assume that participants were unsure on what value
6.2 Probability Distribution Sliders 97
to enter as the control did not allow them to specify uncertainty. The help time
for IC0 was also longer than for all other ICs.
Regarding confidence, participants were mostly confident of the correctness of
their answers for IC3 and IC2, although analyzing the effectiveness showed that
IC3 had the highest deviation from the exact value. The highest standard deviation
could be a sign that participants with lower statistical knowledge had problems to
actually understand the IC. Nevertheless, the SUS score of IC3 was the highest,
followed by the score for IC2.
6.2.4 Implications
Based on the results of the survey and the lab study, we derived implications and
recommendations for the usage of the five input controls.
Basic Slider (IC0): The basic slider can be used if the other sliders are not
applicable or if it is important to allow fast input. However, an ordinary
number field might lead to better results than using a slider. If the users are
uncertain about the expected input, they might be confused by the restriction
of a single value interface. To avoid confusion, it is very important to state
whether mean, mode, or median is the expected input value.
Fixed Range Slider (IC1): The results for the fixed range slider were neither
promising in the online survey, nor in the lab study. Participants were
unsatisfied with the interaction and unconfident about the fixed size of the
range, which did not necessarily match their expectations. We suggest to
preferably use the flexible range slider, except for cases in which the fixed
range can be easily explained to users.
Flexible Range Slider (IC 2): The flexible range slider is applicable in a wide
range of scenarios and even people without a huge amount of statistical
knowledge felt confident in using it to correctly enter data. Specifying the
minimum and maximum value seemed to be an intuitive task. The studies
showed that participants were confident about their input, and usability
ratings were high.
Flexible Range Best Estimate Slider (IC 3): The flexible range best estimate
slider provides a good compromise between functionality and understand-
ability. Participants stated that they preferred using it, which is also sup-
ported by the high ranking for likeability and usefulness in the online survey
98 6 Input Methods
and the good usability score in the user study. However, participants did not
necessarily enter correct data, which suggests that it is already difficult to
use for people with less statistical knowledge. Basic statistical knowledge
is a prerequisite for correctly using the slider.
Advanced Flexible Range Best Estimate Slider (IC 4): The advanced flexible
range best estimate slider was mainly judged negatively by participants.
This is also reflected in the bad SUS score that the control received in the
user study. We therefore do not recommend to use this input control for the
general public. It may, however, be useful for people with high statistical
knowledge. Participants with high statistical knowledge appreciated the
degrees of freedom that the control offers.
Overall, we found that all input controls are suitable to use in different contexts
and for different types of users. However, IC2 is the most promising candidate to
fit most contexts well.
6.3 Tangible Shape-Changing Input Con-
trols
Tangible and shape-changing interfaces offer promising new ways of interaction.
A tangible interface, also known as graspable interface, allows for physical
manipulation [Fitzmaurice, 1997]. Shape-changing interfaces additionally allow
the deformation from one shape to another [Rasmussen et al., 2012]. Roudaut
et al. [2013] provide a taxonomy of the different types of shape-change: area,
granularity, porosity, curvature, amplitude, zero-crossing, closure, stretchability,
strength, and speed that we used as basis for our prototype brainstorming. Shape-
changing interfaces seem promising for the input of uncertain data as the shape-
change aspect could be used for the transfer between a certain and an uncertain
input.
The main goal of this part of the work is to understand whether tangible shape-
changing interfaces are suitable for the input of uncertain data. In the following,
we present six possible designs for shape-changing interfaces that could be used
for the input of uncertain data. With the help of focus groups, we selected the
most promising design and built a fully functional prototype. The prototype was
then evaluated in a lab study.
6.3 Tangible Shape-Changing Input Controls 99
(a) Split Slider (b) Spring Slider
Figure 6.6: Concepts of the split slider and the spring slider.
6.3.1 Prototype Designs and Qualitative Evaluation
We developed six possible designs for shape-changing tangible input controls
that allow the input of uncertainty. We based the designs on the taxonomy by
Card et al. [1991] distinguishing between slider-based and dial-based designs and
used the taxonomy by Roudaut et al. [2013] to brainstorm different ideas. In the
following, we briefly describe all designs.
Slider-Based Designs
Sliders are common input controls in graphical and tangible user interfaces. As
described in Section 6.2, they are suitable for the input of uncertainty due to many
advantages that they offer in comparison to other input controls.
Split Slider. The split slider (see Figure 6.6a) represents a standard slider with a
splittable knob. In one-knob mode, the slider allows users to enter a deterministic
value. In multi-knob mode, the slider allows users to enter a range and a best esti-
mate corresponding to a probability distribution function. The design resembles
the flexible range best estimate slider from Section 6.2.
Spring Slider. The spring slider (see Figure 6.6b) also works like an ordinary
slider. The knob can however additionally be pinched together. A completely
pinched together knob resembles a very certain input, while applying less strength
resembles an uncertain input.
Speed Slider. The speed slider incorporates the speed in which the user moves
the knob of the slider as an input. The faster the user selects an input, the more
certain the user is.
Dial-Based Designs
Similar to sliders, dials are commonly known input controls, especially in the
tangible form used for voice control. Dials are promising for uncertainty input as
100 6 Input Methods
(a) Expandable Dial (b) Pressure Dial (c) Pinch Dial
Figure 6.7: Concepts of all three dial-based designs.
(a) One-knob mode (b) Splitting interaction (c) Three-knob mode
Figure 6.8: First non-functional prototype for the split slider.
they have several degrees of freedom that can be used in addition to the rotation
to allow a user to enter uncertainty. For all dial-based designs, the input value
can be selected by rotation whilst additional parameters allow a user to input
uncertainty.
Expandable Dial. The expandable dial (see Figure 6.7a) can be increased or re-
duced in size in addition to its rotary interaction. A more extended dial resembles
a more uncertain input.
Pressure Dial. The pressure dial (see Figure 6.7b) can be pressed downwards to
enter an additional value for uncertainty. The stronger/further the dial is pressed
down, the more certain the input is.
Pinch Dial. The pinch dial (see Figure 6.7c) has an open space than can be
pinched together. The more the dial is pinched, the more certain the input is.
Qualitative Evaluation
We used existing low-fidelity prototypes for the dial-based designs and addition-
ally built low-fidelity prototypes for the split slider (see Figure 6.8) and the spring
slider. The low-fidelity prototypes were built out of paper or plastic and cut with
the help of a laser cutter. All low-fidelity prototypes were non-functional.
6.3 Tangible Shape-Changing Input Controls 101
(a) Inside view of the slider (b) Close up of the one-knob mode (c) Three-knob mode
Figure 6.9: Functional prototype of the split slider.
We conducted two focus groups with in total 12 participants (10 male, 2 female),
who had an average age of 24.9 (SD= 3.7). Participants first developed scenarios
in which they would like to use uncertain input and ranked them according to their
preferences. In pairs, they then picked one scenario and got a prototype which
they had to evaluate in terms of suitability for their scenario. Participants preferred
the split slider and stated that it was intuitive and flexible. They appreciated the
simple design and the visual feedback. In general, they disliked the dial-based
designs for the uncertainty input being disconnected from the actual input. In the
following, we therefore selected the split slider as the most promising candidate
design to build a fully functional prototype and conduct a lab study.
6.3.2 Evaluation in the Lab
We conducted an explorative user study in the lab to understand whether our
slider design would be suitable for the input of uncertain data. For the study,
we implemented a functional prototype of the split slider (see Figure 6.9) by
using three slider potentiometers and an Arduino. We created the illusion that
the prototype was one slider with three knobs by enlarging the single knobs.
The knob was designed to allow for one-finger control as well as easy splitting.
Magnets gave users a haptic feedback when splitting the knob.
Method
Each participants had to answer twelve questions with the help of the slider. We
constructed the questions to match a public survey in a train with continuous
answer scales. For example, for the question “How often do you use the train?”,
participants answered the question on a scale from “Never” to “Daily”. We used
a 12x12 Latin square design to randomize questions across participants. The
study consisted of two phases. In phase one, participants had to answer the first
102 6 Input Methods
half of the questions without any explanation of the slider. Before answering the
second half of the questions in phase two, they received detailed instructions on
how the slider works. After both phases, participants had to fill in a questionnaire
which consisted of the UMUX questionnaire [Finstad, 2010], one question about
the suitability of the prototype for uncertain input, and one question on the
understanding of the prototype. All questions consisted of a seven-point Likert
item. We additionally measured the input time and stored knob positions and
movements.
Participants
In total, 18 participants (13 male, 5 female) with an average age of 34.4 (SD=
14.9) participated in the study. Six of them had previous knowledge of the
prototype, whilst the other twelve participants had never seen or heard about the
prototype before. In the following, we only analyze the data from the participants
that had never seen the prototype before.
Results
We analyzed the UMUX, the additional questions, as well as the usage behavior
of the participants. We therefore converted all Likert items to the numbers 1 to 7,
where 1 corresponds to “strongly disagree” and therefore a negative answer, and
7 corresponds to “strongly agree” and a positive answer.
Usability. Overall, our prototype received an UMUX score of 82.4 (SD= 17.00),
which for the SUS according to [Bangor et al., 2008] corresponds to an excellent
score. A Friedman test did not show any significant differences for the effect of
knowledge (phase 1 vs. phase 2) on the usability, c2 = 0.818, p= 0.366.
Prototype Suitability and Understanding. Without explanation, the suitabil-
ity of the prototype for uncertain input was rated slightly positive (M = 4.0,
SD= 2.7). A Friedman test showed that after the explanation, this value in-
creased significantly (c2(1) = 7.0, p< .01) to an average of 6.8 (SD= 0.4). The
perceived understanding of the prototype also increased slightly (c2(1) = 5.0, p<
.05) from 6.2 (SD= 1.5) to 6.9 (SD= 0.3).
Splitting Behavior. Figure 6.10a shows how often participants used the different
knob modes. A chi-square test of independence showed that participants used the
three-knob mode significantly more often in the second phase after receiving the
explanation of the prototype, c2(1) = 49.237, p< .001. Similarly, the expressed
variance (distance from the outer knobs, see Figure 6.10b), increased drastically
6.3 Tangible Shape-Changing Input Controls 103
(a) Knob mode usage per phase (b) Expressed variance per phase
Figure 6.10: Influence of the explanation of the prototype on the usage of
different knob modes and the expressed variance for participants without
previous knowledge of the prototype.
from phase 1 (M = 47.4, SD= 90.3) to phase 2 (M = 303.4, SD= 249.2). An
independent 2-group t-test revealed that this difference is statistically significant,
t(89.352) = 8.1942, p< .001.
Input Time. Participants needed on average only half of the time to give a de-
terministic answer (M = 12.0s, SD= 7.3s) in contrast to giving an answer with
uncertainty (M = 21.2s, SD= 11.6s). We found a weak correlation between the
expressed variance and the input time, r(214) = 0.345, p< .001. However, the in-
put time might drop with further usage. We also found a weak correlation between
task order and input time (learning effect) for phase 1 (r(106) = 0.293, p< .01)
and phase 2 (r(106) = 0.295, p< .01).
6.3.3 Discussion & Implications
Participants used the prototype very differently before and after the explanation,
which lead us to the conclusion that the prototype was not self-explanatory. This
problem might be only temporary, because if uncertain input becomes more
common in the future, users will know how to use such devices, but for now an
extra explanation is necessary for people to understand the possibilities of the
input device. However, the ratings of the UMUX score show that our prototype is
very well usable in both modes, one-knob and three-knob mode. This indicates
that people can use the split slider in the standard way if they do not know about
the extra functionality without being confused. New input devices allowing
uncertain input should follow this rule.
104 6 Input Methods
Additionally, most participants found the split slider suitable and understandable
before they got the extra explanations. These values increased even more after
the explanation. This also shows that participants were able to use the prototype
without being confused by new possibilities. Participants tended to stick with
their previous knowledge about deterministic input and mostly did not experiment
with the prototype although they were invited to do so. After the explanation,
participants made high use of the new functionality appreciating the split slider as
an input control to enter uncertainty. This indicates that tangible shape-changing
interfaces and in specific the split slider are suitable devices for uncertain input
and can help users to express their uncertainty.
The results on the input time show that users needed more time to enter values
with higher variance as they potentially had to move more knobs, but also think
longer about the answer. This finding supports the design of allowing the standard
input with a one-knob mode and the more complex input with the three-knob
mode. If time constraints apply, the standard method can be used instead of the
complex method. Nevertheless, further training and familiarization might reduce
the time users need to enter uncertainty. However, this would need to be subject
of a future study.
6.4 Physiological Sensing
Instead of offering explicit uncertainty input to a user, a system could automat-
ically detect how uncertain a user is when providing an input. Physiological
sensing can be used as input modality to substitute or accompany manual un-
certainty input. Heart rate tracking is already used to understand the training
performance of athletes [Tholander and Nylander, 2015], but it can also be used
in interactive systems to detect the level of cognitive stress of a user. Based on the
heart rate variability, interfaces can adapt their complexity accordingly to reduce
stress [McDuff et al., 2016]. Besides heart rate, gaze patterns and other related
gaze properties already serve as input for interactive systems when rating pictures
in photo albums [Walber et al., 2014], enhancing tutoring systems [D’Mello et al.,
2012], or assisting with translations [Hyrskykari et al., 2003]. Studies on gaze
characteristics with the help of quiz questions [Copeland and Gedeon, 2013],
problem solving tasks [Madsen et al., 2012], and language reading tasks [Karolus
et al., 2017] revealed that these characteristics change when users are confronted
with unfamiliar content or difficult tasks.
6.4 Physiological Sensing 105
The main goal of this strand of research is to understand whether physiological
sensing can help to detect users’ uncertainty when entering data. In the following,
we first describe how we generated a set of quiz questions for our user study. We
then present the method and results of our user study with 24 participants who
had to answer the selected quiz questions. During the study, we collected heart
rate, eye tracking, and key logging data.
6.4.1 Question Selection Process
To identify suitable questions for the user study, we first built a pool of questions
in a three-step selection process. In total, we transcribed 1770 German quiz
questions from four books containing questions about general knowledge [Bauer
and Kneip, 2016; Hotz, 2013, 2014; Pfersdorff and Glahn, 2015]. As these
questions were multiple-choice, we deleted all questions that were not solvable
without the multiple-choice answers and removed the answers. On the remaining
1164 questions, we applied three filter criteria to improve comparability:
• Maximum of four words per answer: We eliminated all questions with
answers consisting of more than four words to avoid full-sentence answers.
This minimizes the confounding uncertainty resulting from the need to
spell long or complex phrases.
• Maximum of 15 words per question: We eliminated all questions consist-
ing of more than 15 words as this is the upper border for the recommended
sentence length in German for a sentence to be easily comprehensible
[Seibicke, 1969]. Thus, we removed questions that could be difficult to
read and comprehend.
• Flesch-Reading-Ease of questions between 60 and 80: The Flesch-
Reading-Ease (FRE) [Flesch, 1948] is a readability metric that, based
on the average sentence-length (ASL) and average syllables per word
(ASW), measures how difficult is it to understand a text. We use the
version for German language proposed by Amstad [1978] (FREgerman =
180 ASL  (58.5⇤ASW )) to eliminate questions that are either very easy
or very hard to read in comparison to the other questions in the question set.
This also minimizes the confounding uncertainty introduced by questions
with different degrees of difficulty to read. We chose the interval of the
FRE value to be between 60 and 80, which correlates to the category of
medium easy and medium text.
106 6 Input Methods
Applying all these filter criteria, we reduced the question set to a number of
251 questions. To categorize these questions further, we conducted an online
survey. Each page of the survey contained one question and a textfield to an-
swer the question. Additionally, participants had to indicate how certain they
were about their answer on a five-point Likert scale from “totally disagree” to
“totally agree.” We randomized the order of questions per participant. In total,
59 participants provided 7,939 answers (M = 134.6 questions per participant,
SD= 102.7). Based on the results of the survey, we assigned each question to
one of five uncertainty classes corresponding to the item on the Likert scale that
the majority of participants selected for this question. We then calculated the
ratio of how often the question was rated in its respective class to the total number
of answers for the question prioritizing questions with higher ratios. From the
easiest and the most difficult class, we picked the 40 questions with the highest
ratio; from all other classes we picked the 20 questions with the highest ratio.
This resulted in a question set of 140 questions in total.
6.4.2 Evaluation in the Lab
We used the question set in a lab study to examine the relationships between
users’ uncertainty and their physiological signals.
Method
Participants had to answer 140 questions of different difficulty levels (see Subsec-
tion 6.4.1), then report their perceived uncertainty for each given answer.
After arriving, participants filled in a consent from and a demographic question-
naire. They were then seated in front of a 22 in. LCD display, which showed a
website with the questions running in a browser. Before starting, we attached
the 3 ECG-electrodes from a NEXUS 4 to record ECG signals and calibrated the
stationary eye-tracker (SMI RED 250), which was attached to the bottom of the
display. The whole setup was shielded by three white plain wall constructions to
avoid the disturbance of the participant by the study instructor or the appearance
of the study room. After the attachments of the electrodes and the calibration
phase, we gave participants a detailed explanation of the task and asked them to
always provide a reasonable answer (e.g. stating a city name if the question asked
for a city) and that they could take as much time as they wanted to answer a ques-
tion. After a training question to get used to the study interface, they answered
all 140 questions. The questions were presented one by one in a randomized
6.4 Physiological Sensing 107
order. After answering all questions, participants rated their uncertainty for each
question based on a five-point Likert scale from “Strongly agree” to “Strongly
disagree” to the statement: “I am sure that my answer is correct.” We again
randomized the order of the questions. Participants, however, saw the question
and their given answer to make the assessment. We did not ask participant for
their uncertainty in between questions as this would have probably revealed too
much about the context of the study.
Throughout the study, we collected data from the eye-tracker, the NEXUS, and the
browser. Eye movements were recorded at 250Hz, and the ECG signal at 256Hz.
From the browser, we collected all key events, click events, mouse movements,
field focus events, completion times, and the participants’ answers. From the
collected data, we derived several metrics. From the ECG signal, we calculated
heart rate and heart rate variability. From the browser data, we extracted features
concerning time (such as completion time, time before typing starts, time between
first key stroke and last key stroke), typing behavior (such as typing speed, key
down time, number of deletions), and mouse evens (such as number of clicks;
length of the mouse path). From the recorded eye movements, we extracted
features related to the count, time, and velocity of fixations, saccades, and blinks.
Participants
We recruited 24 participants (15 male, 9 female) with an average age of 23.2
years (SD= 3.4). All of them were native German speakers. Due to technical
difficulties, we had to exclude data from four participants whose physiological
measurements could not be tracked reliably. We also only used a subset of data
for the analysis of eye movements and the heart rate due to unreliable tracking
caused by make-up and loosened electrodes.
Results
We analyzed all collected data, but only present statistical results on a subset of
the measures. We focus on the aspects that we identified as highly promising for
detecting uncertain user input or as interesting lessons learned.
Key Logging Data. First, we investigated the time that elapsed before partic-
ipants started typing an answer (see Figure 6.11a). On average, participants
started typing after 9.19 s (SD= 9.45s). For the lowest self-perceived uncertainty,
participants spent the lowest time until starting to type (M = 5.15s, SD= 4.56s).
In contrast, they took the longest time before starting to type when they perceived
108 6 Input Methods
1 2 3 4 5
Question uncertainty level
0
5
10
15
20
25
30
Ti
m
e 
un
til
 fi
rs
t t
yp
in
g 
in
 s
ec
on
ds
(a) Boxplot for time elapsed until participants started
to type grouped by the perceived uncertainty of the an-
swers (1: very uncertain, 5: very certain).
(b) Violin plot of the refixation ratio grouped by the per-
ceived uncertainty of the answers (1: very uncertain, 5:
very certain).
Figure 6.11: Graphical results for one key logging and one eye tracking
metric.
their answer as uncertain (M = 13.14s, SD= 13.07s). A one-way ANOVA re-
vealed a statistically significant difference (F4,2717 = 84.01, p< .001) in the time
before typing depending on the perceived uncertainty of the answer.
Second, we investigated the time that elapsed between the first and the last key
stroke of participants. On average, participants needed 4.43s (SD= 7.13s) from
their first key stroke to their last key stroke when entering an answer. Participants
were faster (M = 3.73s, SD= 5.60s) when their perceived uncertainty was low,
while longest typing periods (M = 5.51s, SD= 9.91s) occurred when they were
uncertain. A one-way ANOVA revealed a statistically significant difference
(F4,2717 = 8.34, p< .001) in the time between first and last key stroke depending
on the perceived uncertainty of the answer.
Eye Tracking Data. First, we investigate how long participants spent looking at
the answer field normalized by the total time looking at the screen. On average,
the ratio was 0.30 (SD= 0.19). When the perceived uncertainty was high, the
ratio was higher (M = 0.34, SD= 0.21) than when the perceived uncertainty
was low (M = 0.26, SD= 0.18). A one-way ANOVA revealed a statistically
significant difference (F4,1907 = 13.53, p< .001) in the ratio of looking at the
answer field depending on the perceived uncertainty of the answer.
Second, we investigated the amount of refixations normalized by the number of
fixations for the respective question (see Figure 6.11b). On average, the ratio was
0.36 (SD= 0.17). When the perceived uncertainty was low, the refixation ratio
was lower (M = 0.30, SD= 0.17) than when the perceived uncertainty was high
(M = 0.41, SD= 0.18). A one-way ANOVA revealed a statistically significant
difference (F4,1907 = 23.03, p< .001) in the ratio of looking at the answer field
depending on the perceived uncertainty of the answer.
6.5 Insights for Quantifying Uncertainty in User Input 109
ECG Data. We investigated median heart rates (HR) and heart rate variability
(HRV) during the study. We applied multiple aggregation strategies as a reaction
in the heart rate will most likely be delayed to determine whether users’ uncer-
tainty about their answers influenced the heart rate. We used combinations of four
different lag values: 0 ms, 1000 ms, 5000 ms, 9000 ms and three different window
sizes: 1000 ms, 5000 ms, 10000 ms resulting in twelve different combinations.
One-way ANOVAS did not reveal any significant effects of the users’ uncertainty
on median HR and HRV (p> .05).
6.4.3 Discussion & Implications
Different physiological sensors can detect uncertainty of users. In our study,
a combination of key logging and eye tracking proved to be the most reliable
metric to understand how uncertain users are when providing their answers.
However, this could differ from context to context. Other contexts not related to
quiz questions should be evaluated to better understand whether the results are
comparable.
The results of our study also indicate that uncertainty does not have a significant
effect on heart rate signals. This was surprising as other work indicated that heart
rate is correlated with cognitive stress. We believe that the results can be explained
by two reasons. First of all, wrong answers did not have any implications for
participants during our study. They probably felt safe and therefore uncertain
answers did not produce physiological reaction. Second, heart rate reactions
might have been too slow and taken longer than the time needed for answering
one of the provided questions.
We also experienced that current technology for measuring physiological signals
is still prone to technical issues. Data from multiple participants was either lost
or could not be tracked due to individual differences such as glasses, make-up,
skin thickness, etc. Future sensors might be much better in terms of their sensing
capabilities and not prone to these errors. This might lead to better and richer
results.
110 6 Input Methods
Table 6.6: Developed input controls for uncertain data categorized by method,
input modality, and uncertainty value.
Method Input Modality Uncertainty Value Concrete Suggestion
Explicit User Interface Probability Percentage Additional number field or slider
Range Flexible Range Slider
Tangibles Range Split Slider
Implicit Physiologial Sensing Probability Percentage Eye Tracking & Key Logging
6.5 Insights for Quantifying Uncertainty in
User Input
In this chapter, we developed and evaluated multiple input controls and methods
for quantifying uncertainty in user input. Table 6.6 provides an overview about
the best approaches. The input methods can be categorized by whether it is an
explicit or implicit method to quantify uncertainty in user input.
Explicit methods allow the user to enter a value for the uncertainty included in
an input. In our work, we explored paper prototypes for qualitative methods,
however they have two main disadvantages in comparison to quantitative methods.
First, qualitative methods are very difficult to interpret. The interpretation can
vary by user or even by system if natural language needs to be processed. Second,
participants in our interviews raised the concern that qualitative input methods
are very cumbersome to use. For these two reasons, we focused on quantitative
methods. For digital user interfaces, we recommend using a number field or slider
for additional input of a probability percentage (see Section 6.1) or the Flexible
Range Slider for range input (see Section 6.2). As a tangible input control, we
recommend the Split Slider (see 6.3). We additionally recommend that future
input controls focus on offering two ways of interaction; the standard interaction
and an additional interaction to specify uncertainty. Both interactions should be
aligned to each other, but the additional interaction to enter uncertainty should not
confuse the user or complicate the standard input. Thus, additional knowledge
or time would not necessarily required to make a more complex input including
uncertainty.
Implicit methods are methods that automatically track and quantify users’ uncer-
tainty. In Section 6.4, we explored using physiological sensing and behavioral
measurements to quantify uncertainty. We suggest a combination of eye tracking
data and key logging data to implicitly detect uncertainty.
6.5 Insights for Quantifying Uncertainty in User Input 111
In the next chapter, we focus on another important aspect of understanding how
to deal with uncertainty in interactive systems: output methods. We analyze
the current communication of uncertainty in mobile application and look into
uncertainty visualization for step tracking and decision-making under uncertainty.
112 6 Input Methods
Chapter7
Output Methods
We explored the design space of output methods for uncertain data by conducting
one analysis of existing mobile applications and two individual research probes.
For the analysis of existing mobile applications, we looked at what methods
applications presenting uncertain data use and whether they communicate the
uncertainty. We additionally conducted an online survey to understand what users
in general think about such applications. For the two individual research probes,
we first developed an Android application that visualized uncertainty for activity
tracking data, and second developed a Facebook game to explore how players
make decisions based on different amounts of uncertainty information presented
to them.
The main goal of this chapter is to understand the current state of output methods
for uncertain data and how uncertainty visualizations can be used to support
decision-making. We used different sets of existing output methods in our studies
to make a valuable comparison.
114 7 Output Methods
Parts of this chapter are based on the following publications:
• M. Greis, T. Ohler, N. Henze, and A. Schmidt. Investigating representa-
tion alternatives for communicating uncertainty to non-experts. In Lecture
Notes in Computer Science (including subseries Lecture Notes in Arti-
ficial Intelligence and Lecture Notes in Bioinformatics), volume 9299,
pages 256–263, 2015.
• M. Greis, P. E.Agroudy, H. Schuff, T. Machulla, and A. Schmidt.
Decision-Making under Uncertainty: How the Amount of Presented
Uncertainty Influences User Behavior. In Proceedings of the 9th Nordic
Conference on Human-Computer Interaction - NordiCHI ’16, pages 2–5,
2016.
7.1 Communication of Uncertainty in Cur-
rent Mobile Applications
Many mobile applications show uncertain data such as weather forecasts, location
data, or measurements of physical activity. However, it is not clear whether or
how they present uncertainty to their users and what methods they use to generate
trust in their measurements and predictions. Additionally, users might prefer a
different communication that has not so far been used in mobile applications.
The main goal of our work is to understand the methods current mobile applica-
tions use to communicate uncertain data and whether users are satisfied with this
communication. We analyzed 30 mobile applications in the three areas of weather
forecasting, navigation/fuel prizes, and healthcare. We picked the most installed
applications available for free in the Google Playstore and the Apple Appstore.
We were mainly interested in which methods these applications use to show data
and whether they indicate the uncertainty of the data. We additionally conducted
an online survey to assess users’ trust in the reliability of such applications and
their preferences for the communication of uncertainty.
In the following, we present the analysis of 10 weather applications, 10 naviga-
tion/fuel prize applications, and 10 healthcare applications. We mainly focus on
7.1 Communication of Uncertainty in Current Mobile
Applications 115
Table 7.1: Methods used to display uncertain data in the following weather
applications: W1 - AccuWeather, W2 - BayWa Agri-Check, W3 - Blitzortung
Gewitter-Monitor, W4 - Regenradar, W5 - wetter.com, W6 - wetter.info, W7 -
Wetter 14 Tage, W8 - Wetter Deutschland XL PRO, W9 - Wetter Online, and
W10 - Windfinder.
Methods W1 W2 W3 W4 W5 W6 W7 W8 W9 W10
Diagrams
Text
Symbols
Percentages
Values
Maps
how the information in these applications is depicted. We additionally present the
results of our online survey.
7.1.1 Analysis of Mobile Applications
We downloaded ten free weather applications, ten free navigation/fuel prize
applications, and ten free healthcare applications from the Google Playstore and
Apple Appstore. We focused on applications available in German. We then
took screenshots of all different screens of the applications to categorize how the
information was displayed.
Weather Applications
Table 7.1 shows an overview of the analyzed weather applications and how they
displayed weather information. The applications used six different methods:
Four used diagrams (e.g. line graphs, pie charts), five used verbal expressions
(e.g. “low” or “high”), eight used symbols (e.g. grey clouds or a yellow sun),
eight used percentage values (e.g. the probability of rain), nine used concrete
values (e.g. “16   C”), and all used colored maps (e.g. for showing temperatures,
rain intensity). Besides showing the probability of rain, none of the applications
indicated the uncertainty of the information. All weather applications showed the
data as deterministic values.
116 7 Output Methods
Table 7.2: Methods used to display uncertain data in the following naviga-
tion/fuel prize applications: N1 - Blitzer.de, N2 - Blitzer!, N3 - Clever Tanken,
N4 - DB Navigator, N5 - Google Maps, N6 - iOS Karten, N7 - MyTaxi - Die
Taxi App, N8 - Offline Maps Navigation, N9 - StauMobil, and N10 - Tanken
App.
Methods N1 N2 N3 N4 N5 N6 N7 N8 N9 N10
Text
Symbols
Values
Maps
Navigation/Fuel Prize Applications
Table 7.2 shows an overview of the analyzed applications in the field of navigation
and fuel prize applications. The applications used four different methods for
displaying information. Seven applications used symbols (e.g. a pin for the
location), nine used text (e.g. a textual description of a traffic jam prediction),
nine used values (e.g. “Distance: 9.4 km”), and all used maps (e.g. to display
the location and routes). One application (MyTaxi - Die Taxi App) used verbal
expressions to convey uncertainty, by stating that it takes “approximately” an
amount of time until the cab arrives and that the prize is an “approximate value”.
Two more applications used the terms “prediction” or “forecast” in their headline,
but gave no detailed information about how uncertain their information was or
how it was calculated.
Healthcare Applications
Table 7.3 shows an overview of the analyzed applications in the field of healthcare.
The applications used five different methods for displaying information. Two
applications used maps (e.g. to show walking routes), six used diagrams (e.g.
bar charts, pie charts), six used symbols (e.g. a green human, a red heart),
seven used text (e.g. “in good shape”), and all used values (e.g. “103 bpm”).
One application used the word “approximately” to indicate that the values were
uncertain. Additionally, the application “Clue - Menstruationskalender” showed
“(+/  x days)” next to its predictions, indicating how much the prediction of
the day was uncertain. None of the other applications showed how uncertain the
presented information was.
7.1 Communication of Uncertainty in Current Mobile
Applications 117
Table 7.3: Methods used to display uncertain data in the following healthcare
applications: H1 - Schrittzaehlen - Accupedo, H2 - Cardiio, H3 - Clue
- Menstruationskalender, H4 - Idealgewicht, H5 - Kardio - Herzfrequenz
Monitor, H6 - Komoot - Fahrrad und Wander GPS, H7 - Nichtraucher Coach,
H8 - Promillerechner Live, H9 - Runtastic GPS, and H10 - Sehtest.
Methods H1 H2 H3 H4 H5 H6 H7 H8 H9 H10
Text
Symbols
Values
Diagrams
Maps
7.1.2 Online Survey
We conducted an online survey to assess users’ general perception of applications
in the analyzed areas. We were mainly interested in their opinions about the
reliability of measurements and predictions in these applications. Additionally,
we wanted to know whether they would like to get more information about the
underlying uncertainty of the displayed data.
Method
We first asked participants of the online survey to provide demographic informa-
tion such as gender and age. We then asked whether they use weather, navigation,
or healthcare applications. Depending on what type of applications participants
used, they answered the following questions for each type of application they
used: (1) How reliable do you think the measurements are? (2) How reliable do
you think the predictions are? (3) Would you like to see more clearly whether the
data is uncertain? Participants answered all questions on a five-point Likert scale
from “not at all” to “very much”.
Participants
115 participants (69 male, 45 female, 1 preferred not to say) between the age of 13
and 51 took part in our online survey. Around 80% used weather and navigation
applications, and more than 50% used healthcare applications.
118 7 Output Methods
Results
We converted participants’ answers from the Likert items to a scale from 1 (“not
at all”) to 5 (“very much”). In general, we found that healthcare measurements
seemed to be most unreliable for participants (M = 3.0, SD= 1.0), followed
by weather measurements (M = 3.9, SD= 1.0) and measurements in navigation
applications (M = 4.2, SD= 0.9). Predictions seemed to be considered slightly
more unreliable compared to measurements for weather applications (M = 3.7,
SD= 0.9) and navigation applications (M = 4.1, SD= 0.9). Although partici-
pants in general did not see the reliability on the negative end, participants wished
to have information about the uncertainty in all applications (weather: M = 4.2,
SD= 0.9, healthcare: M = 4.0, SD= 1.1, navigation: M = 3.6, SD= 1.2).
7.1.3 Discussion & Implications
The analysis of existing mobile applications showed that applications seldom com-
municate uncertainty information. The rare occurrence of applications communi-
cating this information mostly use verbal expressions such as “approximately”.
This leaves quite some open room for future improvement and implementation of
uncertainty visualizations.
The results of our survey indicate that users in general seem to find predictions
more unreliable than measurements. How reliable they perceive an application to
be depends on the area of the application and potentially the application design
itself.
We also found that users actually wish for uncertainty to be displayed in mobile
applications. Displaying the uncertainty of data could potentially help to even
increase the perceived reliability of an application.
7.2 Uncertainty Visualization for Activity
Tracking
Multiple studies have already showed that the measuring accuracy of activity
trackers depends on the brand and kind of device [Case et al., 2015; Guo et al.,
2013], the body part it is worn on [Sasaki et al., 2015], and the walking speed
[Crouter et al., 2003]. The measurement errors for step counts range from 1% up
7.2 Uncertainty Visualization for Activity Tracking 119
to nearly 30% [Guo et al., 2013]. We therefore wanted to understand how this
uncertainty in the measurements could be communicated to the user.
The main goal of the work presented in this section is to first understand how
users perceive the uncertainty of the measurements and how much they trust their
devices and second, develop graphical overviews for activity data that take into
account uncertainty.
7.2.1 Online Survey
We conducted an online survey to assess how people use activity trackers and how
much they know about the uncertainty of their measurements. We additionally
used the online survey to determine people’s beliefs on how uncertain activity
trackers are and to collect ideas on how the uncertainty of activity trackers could
be visualized as tracking results.
Method
We first asked participants to provide demographic data such as gender and age.
We then asked some general questions about sports and activity tracking. Partic-
ipants were then separated into four different groups: those that currently used
activity trackers, those that had used an activity tracker in the past, those that used
other devices for activity tracking (such as a treadmill), and participants that had
never used any activity tracking device. We were mainly interested in participants
who had already used an activity tracker or currently used one. Depending on
their experience with activity trackers, participants answered different questions
over the course of the study. We asked those who had used activity trackers
before to answer questions about the measuring reliability, graphical overviews
used by their activity trackers, and errors that they had spotted. We structured
all questions as five-point Likert items. After these basic questions, participants
had to answer concrete questions about how much the displayed results of an
activity tracker (e.g., 2000 steps, 5000 steps, and 9000 steps) deviate from the
true value. At the end of the survey, we asked participants for their opinion about
visualizing uncertainty for activity tracking data and their ideas on how this could
be accomplished.
Participants
In total, 364 participants (153 female, 206 male, 5 preferred not to say) with an
average age of 28.4 (SD= 8.8) completed the online survey. Participants did all
120 7 Output Methods
Table 7.4: Participants’ agreement on a five-point Likert scale with the
presented statements.
Statement M SD
I am satisfied with the functionality of my activity tracker. 4.3 0.6
I trust the exactness of the measurements of my activity tracker. 4.0 0.6
I am satisfied with the graphical overview in the application. 4.1 0.8
I know that the measurements of activity trackers are not always correct. 4.4 0.9
I always remember the exact value displayed on my activity tracker. 2.5 1.1
In my thoughts, I round the values displayed on my activity tracker
up/down.
2.6 1.2
kind of sports, e.g. swimming, soccer, bicycling, running, fitness training or more
specialized sports such as archery or diving. 118 of our participants stated that
they had never used an activity tracker before; 46 only used other devices that
track activity data (e.g. a treadmill), and 29 indicated that they had stopped using
their activity tracker. 171 participants (97 female, 72 male, 1 preferred not to say)
were current users of activity tackers at the time of the survey.
Results
In our analysis, we only used the data provided by the 171 current users, as we
did not have enough participants who had abandoned their activity tracker to get
an indication about how that might affect the perceived reliability. We converted
all Likert items to a number from “totally disagree” as 1 to “totally agree” as
5. The answers for the basic questions on reliability are displayed in Table 7.4.
Although most of the participants stated that they knew the measurements of
activity trackers were not always correct, 17.5% stated that they had not been
aware of this before taking part in the survey.
For the reliability of measurements of activity trackers, we found that for most
measurements, participants thought that the measurement error would be between
2.2%-5.6% for steps (see Figure 7.1) and between 1.4%-10.5% for distances
(see Figure 7.4). For the treadmill, participants estimated a higher percentage of
measurement errors for the calories between 5%-16.7% (see Figure 7.3), however,
they estimated a lower error for distances between 1.7%-4% (see Figure 7.4).
The estimated error is in the same range for different amounts/distances. However,
for the distance on the treadmill, the answers are not normally distributed as for
the other three estimations.
7.2 Uncertainty Visualization for Activity Tracking 121
0-1
0 s
tep
s
10
-50
ste
ps
50
-10
0 s
tep
s
10
0-2
00
ste
ps
20
0-5
00
ste
ps
50
0-1
00
0 s
tep
s
10
00
-20
00
ste
ps
mo
re
tha
n 2
00
0 s
tep
s
0
20
40
60
#p
ar
tic
ip
an
ts
2000 steps 5000 steps 9000 steps
Figure 7.1: Participants’ answers on the question “If an activity tracker
measures ... steps on one day, how high do you estimate the error of the
measurement?” for the amount of 2000, 5000, and 9000 steps.
0-1
00
m
10
0-5
00
m
50
0-1
00
0 m
10
00
-20
00
m
20
00
-30
00
m
30
00
-50
00
m
50
00
-70
00
m
mo
re
tha
n 7
00
0 m
0
20
40
60
#p
ar
tic
ip
an
ts
7 km 13 km 19 km
Figure 7.2: Participants’ answers on the question “If an activity tracker
measures a covered distance of ... kilometers on one day, how high do you
estimate the error of the measurement?” for the amount of 7, 13, and 19 km.
122 7 Output Methods
0-5
kc
al
5-1
0 k
cal
10
-30
kc
al
30
-50
kc
al
50
-10
0 k
cal
10
0-2
00
kc
al
mo
re
tha
n 2
00
kc
al
0
20
40
60
#p
ar
tic
ip
an
ts
200 kcal 400 kcal 600 kcal
Figure 7.3: Participants’ answers on the question “If a treadmill measures a
calorie expenditure of ... in 30 minutes, how high do you estimate the error
of the measurement?” for the amount of 200, 400, and 600 kcal.
0-1
0 m
10
-50
m
50
-10
0 m
10
0-2
00
m
20
0-5
00
m
50
0-1
00
0 m
10
00
-20
00
m
mo
re
tha
n 2
00
0 m
0
20
40
60
#p
ar
tic
ip
an
ts
3 km 5 km 7 km
Figure 7.4: Participants’ answers on the question “If a treadmill measures a
covered distance of ... kilometers in 30 minutes, how high do you estimate
the error of the measurement?” for the amount of 3, 5, and 7 km.
7.2 Uncertainty Visualization for Activity Tracking 123
Table 7.5: Participants’ agreement on a five-point Likert scale with the
presented statements.
Statement M SD
If my activity tracker shows 492 kcal after my training, I would continue
my training to reach 500 kcal.
3.4 1.4
The error of measurement of activity trackers is that high that the results
of step and calorie tracking are too unreliable.
2.7 0.9
I would prefer if manufacturers of activity trackers would publish more
information about possible errors of measurement.
3.8 0.9
I would like an activity tracker to take into account potential errors of
measurements in the graphical overview.
3.6 1.1
Although participants stated that they knew the measurements of activity trackers
might not be completely reliable, they still slightly agreed with the statement
that they would continue their training if their device showed 492 kcal (see Table
7.5). They did not find their devices too unreliable, but would prefer to have more
information on the possible errors of measurements and also stated a preference
for having them displayed in the graphical overview.
Participants had several ideas on how the measurement uncertainty could be
displayed in the graphical overview of their activity trackers. They for example
suggested to show an error percentage, a minimum and maximum value (e.g. grey
bar, boxplot), a color as indication how uncertain the value is, or an overview that
only roughly marks the value instead of providing a single value.
Discussion
Although most users know that the data displayed by activity trackers is not
completely reliable, there are users who seem to trust the exact value shown
on the display. Additionally, although most users know that the activity data is
unreliable or could be wrong, they would continue training to reach a certain
threshold (e.g. 500 kcal). We assume that psychological aspects play an important
role in making users want to reach the threshold despite the knowledge that their
actual calorie expenditure might differ from the displayed amount.
Participants estimated the error of measurement for steps up to 5.6%. The real
error of measurement, however, can be much higher depending on the used device.
A group of participants also seemed to trust a treadmill more in terms of distance
than an activity tracker. This can be seen in the unusual distribution of the data.
Participants also applied different estimates for different data types, for example
124 7 Output Methods
(a) Color palette (b) Range of values
Figure 7.5: To develop graphical overviews for activity tracking data, we cre-
ated a standard color palette to indicate whether the users reached their goals
and additionally replaced the normal deterministic value with a displayed
range.
steps and calories. From these findings, we can learn that it depends on the
actual type or maybe even the brand of a device and the data type who uncertain
participants estimate measurements and predictions.
We also found that participants in general would like to have more information
about errors of measurement. They also expressed a slight agreement for display-
ing uncertainty in the graphical overview and had diverse suggestions on how the
graphical overview could be adapted to fulfil this requirement.
7.2.2 Evaluation in the Wild
Based on participants’ suggestions in the online survey, we developed two basic
components for the display of uncertainty in activity data. First of all, we created
a color palette (see Figure 7.5a) as many participants stated that colors would
help them to quickly decide whether they reached their goal. We used a color
scheme from red to green with green being the color for the reached goal. Second,
multiple participants suggested to not show a single value, but a range of the
acquired steps that includes a certain measurement error. Therefore, we developed
a textual range representation (see Figure 7.5b).
In addition to these basic components, we developed three graphical overviews
that can be used instead or in combination with the textual range representation.
The first graphical overview is a bar chart with a grey part to indicate the potential
error of measurement (see Figure 7.6a). The second graphical overview shows
an error bar in addition to the bar chart (see Figure 7.6b). The third graphical
overview is a speedometer with two needles (see Figure 7.6c), as many graphical
overviews for activity data currently use speedometers or pie charts. All graphical
overviews use the color palette introduced in Figure 7.5a to indicate how far users
are from reaching their goal.
7.2 Uncertainty Visualization for Activity Tracking 125
(a) Bar chart with range (b) Bar chart with error bars (c) Speedometer
Figure 7.6: Three different graphical overviews for activity tracking data
developed on the basis of the comments of participants in the online survey.
We conducted a user study with an Android application to explore whether users
would understand and accept the new graphical overviews.
Android Application
We developed an Android application called Pedometer that tracks users’ steps,
and calculates calorie expenditure and the walked distance based on the step
count. The application shows all values as ranges (see Figure 7.7a) instead of
showing one single value. Users can enter personal details and their aim through
the screen displayed in Figure 7.7b. They can adjust their step aim by simply
changing a number (see Figure 7.7c). The application offers the three graphical
overviews depicted in Figure 7.6, and users can choose which one they prefer
as a default view. When clicking on a displayed range, the application shows
their default graphical overview providing the possibility to swipe to see the
alternatives.
Method
We invited the prospective participants to the lab to give them a short introduction
to the study and the graphical overviews. Participants had to fill a pre-study ques-
tionnaire and install the application on their Android device. We also collected
their contact details.
126 7 Output Methods
(a) Start screen of the Pedometer
application
(b) Preferences in the Pedometer
application
(c) Entering a goal in the Pedome-
ter application
Figure 7.7: Screens of the Pedometer application.
We asked participants to use the Pedometer application for at least one week. Each
day, we contacted participants twice to remind them to use the application and
give them instructions for the day. On the first day, participants had to download
any step tracking application of their choice from the Google Playstore and
compare it to the Pedometer application. On the following three days, participants
used one specific graphical overview of the Pedometer application each day to
get an overview of all of them. On the fifth and sixth day, we asked participants
to switch between the graphical overviews and compare them. On the seventh
day, they were free to choose their favorite graphical overview to use on the last
day of the study.
At the end of the week we again invited the participants to the lab, and asked
them to answer a questionnaire about their opinion of the application and the
graphical overviews.
Participants
We recruited 10 participants (6 male, 4 female) with an average age of 22 years,
who used the Pedometer application for one week. In the pre-questionnaire all
participants stated that they knew that measurement errors occur when using
activity trackers. They estimated the error of measurements to lie between 7%-
10%.
7.2 Uncertainty Visualization for Activity Tracking 127
Results
In general, most participants liked using the Pedometer application. Seven out of
ten stated that they would also like to continue using graphical overviews showing
a range instead of a single value. However, they wanted this as an optional
feature so that they could enable it depending on the sports they were doing.
As they expected an activity tracker to track walking better than basketball they
would enable the setting while playing basketball, but not while walking. Three
participants also mentioned that the numbers of the range could be rounded to
the next ten or hundred to make it easier to read. Four participants also explicitly
stated that they liked the bar below the values which gave a rough and quick
overview of the activity.
The three participants who did not like the range display and the novel graphical
overviews stated two main reasons for their opinions. Two participants stated that
two numbers are too complicated to read and do not allow for a quick overview.
As they know about the errors of measurements, they perceive the single displayed
value as an estimate anyway. Another participant stated that he was too used to
the graphical overview he normally uses, and tried to calculate the average all the
time.
At the end, we asked participants which one of the graphical overviews they
preferred most. Six participants preferred the bar chart with the grey bar. They
stated that it was easy to understand and to determine how far they still were from
reaching their goal. Three participants preferred the speedometer. They liked
its compactness and found it motivating. However, they suggested a display of
colored triangle instead of two needles. One participant preferred the bar chart
with the error bar as it contained the exact value and the measurement error in
one visualization. However, this opinion was not shared by other participants.
Discussion
In general, participants liked to have graphical overviews including uncertainty,
but wanted the feature to be optional. A graphical overview for activity data
should therefore either allow users to switch between a single value and an
interval or show both at the same time. The bar chart with the grey bar could
for example also show a line for the measured value and just show the grey bar
to raise awareness for the uncertainty. Additionally, multiple different graphical
overviews could be offered for users to pick their favorite one. The speedometer,
for example, was only preferred by three participants, but the others did not
dislike it even though it was not their preferred option.
128 7 Output Methods
As participants complained about the range not being easy to grasp and remember,
it could be interesting to show one color or a progress bar only instead of concrete
numbers. This would make the information quickly graspable without using a
concrete number that may imply wrong reliability. Users would only be able
to roughly estimate the current value. This may be too drastic, but on the other
hand would also preserve privacy as only the user knows the personal goal and
therefore can interpret the color or progress bar correctly.
7.2.3 Implications
Graphical overviews for activity data should communicate the uncertainty of the
data. On one hand, users prefer to get more information, but on the other hand
want to be in control of the shown data. Therefore, graphical overviews should
either include a concrete value and an additional indicator for uncertainty or
make the display of the uncertain information optional. Nevertheless, showing the
potential measurement errors can educate users not aware of them to recognize the
associated uncertainty and take them into account for decision-making. However,
manufacturers of activity trackers might not be eager to share information about
measurement errors with their customers as devices could be compared by their
reliability more easily. On the other hand, sharing such information could create
more trust in products that communicate uncertainty.
One other important aspect is that a graphical overview for activity data still has
to be easy to grasp. Adding information about the measurement error results in
more visual clutter, which may add too much cognitive overload for users. It is
therefore very important to keep the simplicity of the graphical overview when
adding extra information.
7.3 Decision-making under Uncertainty
Previous work on uncertainty visualization focused on a small number of repre-
sentations, variations of one representation, or a very specific task (e.g. finding the
mean). Although it is known that showing uncertainty leads to better decisions,
the relation between different degrees of uncertainty information included in a
visualization and decision-making is still unclear. In contrast to prior work, our
approach is to compare a large number of representations based on the amount
7.3 Decision-making under Uncertainty 129
of uncertainty information they include instead of merely comparing different
visualization techniques.
The main goal of the work presented in this section is to understand how the
presented amount of uncertainty influences decision-making and whether other
confounding factors besides the amount of uncertainty information play a role for
users’ preference. In the following, we classify 12 representations and compare
them in an online survey. We further present the results of a follow-up experiment
in which we compared four of the representations in an online Facebook game to
understand their influence on decision-making.
7.3.1 Classification of Representations
Building upon prior research, we selected twelve representations (see Figure
7.8 and Figure 7.9) with different properties for communicating uncertainty
information. These representations are classically used to communicate uncertain
data to scientists and the general public. All representations show the expected
rainfall for the next three days. Three representations use a textual representation
of the information, whilst the other nine are graphical. We use a line chart, a box-
and-whisker plot, bar charts, stacked bar charts, stacked area diagrams, shaded
bars, and function graphs. Table 7.6 clusters the representations based on the
degree of uncertainty information included in the representations. Representations
can either contain no information about the uncertainty, aggregated information,
detailed aggregated information, or fully detailed information. This classification
based on the degree of uncertainty information included in a representation, is
a novel way of classifying representations. We argue that this classification
simplifies the comparison of representations including uncertainty information.
7.3.2 Online Evaluation
We first aimed at evaluating all representations to reduce their number for follow-
up experiments. We therefore conducted an online survey to compare all twelve
representations according to their perceived value for decision support and their
easiness.
130 7 Output Methods
(a) Expected value (b) Expected value and SD
(c) Quantiles (d) Line chart with confidence interval
(e) Box-and-whisker plot (f) Bar chart with error bars
Figure 7.8: First half of the representations used in the online survey showing
no or aggregated uncertainty information.
7.3 Decision-making under Uncertainty 131
(a) Histograms as bar charts (b) Histograms as stacked bar chart
(c) Histograms as area chart (d) Shaded horizontal bars
(e) Probability distribution function plot (f) Cumulative probability distribution function plot
Figure 7.9: Second half of the representations used in the online survey
showing detailed aggregated or detailed uncertainty information.
132 7 Output Methods
Table 7.6: Degree of uncertainty information included in the representations.
Textual Representation Graphical Representation
No Uncertainty Informa-
tion
REP1: Expected values -
Aggregated Uncertainty
Information
REP2: Expected values and
standard deviation
REP4: Expected values and
confidence interval
REP3: Quantiles REP5: Quantiles
REP6: Expected values and
standard deviation
Detailed Aggregated Un-
certainty Information
- REP7-9: Aggregated proba-
bility density function
REP10: Color-coded proba-
bility density function
Detailed Uncertainty In-
formation
- REP11: Probability density
function
REP12: Cumulative proba-
bility density function
Method
We first asked participants for demographic information such as age, gender,
highest degree, and background. We then introduced the study with a scenario
description. Participants should imagine that they are a farmer who wants to
grow crops. The plants need a certain amount of rain to survive and grow. A
weather forecast supports them to decide which crops to grow. As expected
from a weather forecast, it is uncertain. After reading the scenario description,
participants started the main part of the survey. Each of the following twelve
pages of the survey contained one of the representations. The order of repre-
sentations was randomized across participants to reduce sequence effects. For
each representation, participants had to indicate their level of agreement on a
five-point Likert scale from “totally disagree” to “totally agree” for the following
four statements:
• The representation supports me in making a decision.
• I am familiar with the representation.
• The representation is easy to understand.
• The representation is visually appealing.
7.3 Decision-making under Uncertainty 133
Participants
In total, 90 participants (36 female, 54 male) fully completed the online survey.
Participants’ age ranged from 18 to 82 years (M = 31.0, SD= 12.6). 45% of our
participants had a university degree, 28% a high school diploma, and a further
20% had vocational training. The other participants either had a minor or no
degree at all. Participants had diverse backgrounds such as computer science,
economics, teaching, mechanics, and services.
Results
We converted the Likert items to numbers; 1 corresponding to “totally disagree”
and 5 corresponding to “totally agree”. For each representation, we then calcu-
lated the mean for each of the four statements and an overall mean. For each
statement, we also conducted a Friedman test and Wilcoxon signed rank tests
with an applied Bonferroni correction. The Friedman test showed that there was
a statistically significant difference between the twelve representations for each
statement. The Wilcoxon signed-rank test revealed that representation 1, 4, 7, and
11 performed significantly better than the majority of other representations for at
least one judgment each. Representation 3 was rated significantly worse than the
majority of representations on three statements.
We additionally ran a Spearman’s rank-order correlation to determine relation-
ships between the statements and the degree of uncertainty information that the
representations show. As expected, we found strong positive correlations between
all pairs of Likert items, which all were statistically significant (p< .001). How-
ever, the correlation coefficients did not reveal any significant positive or negative
correlation between the statements and the degree of uncertainty information of
the representations. The only exception was one moderately positive correlation
with the Likert items for visual appeal. We assume that this correlation occurred
because the representations with low degrees of uncertainty information were
textual representations and therefore less appealing for participants.
Discussion
The results of the online survey show that participants had different opinions on
how much the representations would help them to make a decision. Surprisingly,
the four best-rated representations taking the overall mean show a different
degree of uncertainty information each. We found that participants do not rate
the representations based on their degree of uncertainty. Instead, factors such
as familiarity, easiness to understand, and visual appeal have a huge influence
134 7 Output Methods
Table 7.7: Calculated mean values for the level of agreement on a five-point
Likert scale from totally disagree (1) to totally agree (5) with the statements:
S1 - The representation supports me in making a decision., S2 - I am familiar
with the representation., S3 - The representation is easy to understand., S4 -
The representation is visually appealing., and O - the overall mean values.
Representations S1 S2 S3 S4 O
1: Expected values 3.6 4.3 4.3 2.3 3.6
2: Expected values and standard deviation 3.7 3.9 3.3 2.1 3.3
3: Quantiles 2.9 2.8 2.4 1.7 2.4
4: Line chart with confidence interval 4.1 3.7 4.1 4.0 4.0
5: Boxplot 3.2 2.7 2.5 2.5 2.7
6: Bar chart with error bars 3.5 3.1 3.1 3.0 3.2
7: Histograms as bar charts 3.9 4.2 3.7 3.7 3.9
8: Histograms as stacked bar charts 3.5 3.5 3.2 3.5 3.4
9: Histograms as area chart 3.2 2.9 2.8 3.4 3.0
10: Shaded horizontal bars 3.7 2.3 3.6 3.6 3.3
11: Probability distribution function 3.9 3.9 3.5 3.6 3.7
12: Cum. probability distribution function 3.4 3.5 2.9 3.5 3.3
Table 7.8: Spearmen’s rho for a Spearman’s rank-order correlation between
Likert scale items of the online survey and the degree of uncertainty infor-
mation of the presented representations. Statistically significant values are
marked with asterisk(s). ⇤⇤p< .01, ⇤p< .001
Decision
Support Familiarity
Easiness to
Understand
Visual
Appeal
Degree of Uncertainty 0.048  0.040  0.084⇤⇤ 0.365⇤
Decision Support - 0.505⇤ 0.670⇤ 0.527⇤
Familiarity - - 0.609⇤ 0.530⇤
Easiness to Understand - - - 0.530⇤
7.3 Decision-making under Uncertainty 135
on whether participants found a representation suitable for decision making.
Our work indicates that these confounding factors have to be considered when
displaying uncertain data.
Based on our work, the most promising candidates for uncertainty visualization
with different degrees of uncertainty can be selected for future studies with the
general public.
7.3.3 Evaluation in the Wild
We developed a web-based game called “Farm Smart” where we included the
four representations that performed best in the online evaluation. We ran a pilot
study with 12 participants and based on their feedback improved the game before
publishing it on Facebook.
Method
In the turn-based Facebook game “Farm Smart” (see Figure 7.10), players can
buy, plant, and harvest crops to earn as much money as possible. Each crop needs
a certain amount of precipitation and has a certain ripening time between one and
three days. In order to successfully grow, the precipitation value of a crop has to
be fulfilled during the whole ripening time. To decide which crop to plant, players
can look at the forecasted precipitation of the next three days by clicking on a
button (see Figure 7.10b). After planting as many different crops as desired, a
player ends the day by clicking on the “Next day” button. The weather (displayed
in the upper left corner of the screen) and specific field icons indicate whether the
crop survived or withered (see Figure 7.10a). Ten days (i.e. turns) correspond
to one game. We included the four representations performing best in the online
survey for displaying the weather forecast. The representation would change
during a game. Each player could in total play four games; one with each of the
four representations and randomly assigned to one of the 24 permutations of the
order of the representations at the first start of “Farm Smart”. To recruit players,
we shared the game link on Facebook and posted advertisements in online gaming
communities.
Requirements and Prices of Crops. The seed costs, sale prices, ripening times,
and weather requirements of crops were specified differently to ensure a wide
range of possible decisions in all weather situations. Players are motivated to
equally consider more demanding crops, as higher requirements or ripening times
136 7 Output Methods
(a) General view of “Farm Smart” with the possibility
to plant crops on the field displayed in the middle of
the screen.
(b) Weather forecast as displayed in “Farm Smart”.
Figure 7.10: Screenshots of our Facebook game “Farm Smart” to compare
representations including a different degree of uncertainty information.
are rewarded with a higher money gain when selling the crops. The rewards for
the overall game were chosen to be pareto-optimal.
Weather Calculation. The weather forecast in the game is based on real weather
measurements of a meteorological station5 in Stuttgart. The values used in the
game cover a continuous period of 62 days between spring and summer 2001.
This period offers diverse weather conditions and is in general rich in rainfalls.
For every game, the start day is uniformly selected at random from the first 49
days of the period to ensure that the forecasts are available for the whole game. A
weather forecast in “Farm Smart” corresponds to a set of three modified Gaussian
distributions. For a day d the forecast for the n-th day (n2 {1,2,3}) is constructed
as follows:
1. The expected value µ 0 of the Gaussian distribution is calculated by off-
setting the real value of the day d+ n by a factor n ·a ·X with a > 0
and a sample X ⇠N (0,1). This corresponds to the assumption that the
magnitude of the inaccuracy in weather forecasts is approximately nor-
mally distributed and increases linearly with the temporal distance to the
predicted event. As a negative µ 0 for the precipitation cannot be reasonably
interpreted in our context, µ 0 is set to 0 if µ 0 < 0.
2. The standard deviation of the Gaussian distribution is chosen as n ·s with
s 2 R+0 . This corresponds to a horizontal compression of the distribution’s
5 Source: German Weather Service, http://www.dwd.de
7.3 Decision-making under Uncertainty 137
graph and represents the linearly increasing uncertainty within temporal
distant weather forecasts. It follows the Gaussian distributionN (µ 0,n ·s)
or in more detailN (RealValue(d+n)+n ·a ·X ,n ·s).
We tested and compared different parameters a and s before and in our pilot
study. We then chose a = 0.8 und s = 1.0, because the values conveyed a
suitable amount of uncertainty in the prediction.
Representations. We decided to use the four representations that performed best
in the online evaluation and implemented them using HighCharts6. We shortly
explain these visualizations in more detail:
Text: It shows the expected value, which gives no information at all about the
uncertainty (see Figure 7.11a).
Line chart: It also shows the expected value, but adds quantiles, which were
chosen as the 0.05 and the 0.95 quantile (see Figure 7.11b).
Bar chart: It visualizes the sum of probabilities within a certain range (see
Figure 7.11c). For the bar colors, we used the HCL color space to achieve
optimal differentiability and provoke equally intense perceptions.
Probability distribution function: It displays the underlying distributions in
full detail in a probability distribution function graph (see Figure 7.11d).
The representations show an increasing amount of information about the uncer-
tainty: the first representation shows no information, the second representation
shows aggregated uncertainty information, the third representation shows aggre-
gated detailed information, and the fourth representation shows all details.
Data Collection. We collected two types of data. First, we collected survey data,
which included personal data and subjective feedback about the representations.
After finishing a game, we asked players to rate the used representation on a five-
point Likert scale with the statements used in the online survey (see Subsection
7.3.2). Second, we logged all game parameters such as accumulated money,
representation type, forecasted weather, and the occurred weather. We additionally
counted on how often participants clicked the button to open the weather forecast.
6 HighCharts, http://www.highcharts.com
138 7 Output Methods
(a) REP1 - Text (b) REP2 - Line chart
(c) REP3 - Bar chart (d) REP4 - Probability distribution function
Figure 7.11: The four representations each display weather forecasts with a
different amount of uncertainty information.
Results
We analyzed the data of 44 players who in total played 98 games (on average 2.2
games per player, SD= 1.3) composed of 991 turns. We only considered games
in which player actively opened the weather forecast. On average, participants
played 247.75 turns (SD= 34.1) per representation. As not all players played
four games, they might only have experienced a subset of representations.
Survey Data. In total, 38 players completed the short survey (29 male, 9 female)
after completing 88 games. We grouped their answers into three categories: (1)
agree (includes agree and strongly agree), (2) neutral, and (3) disagree (includes
disagree and strongly disagree). As Table 7.9 shows, players preferred the line
chart and the bar chart.
Log Data. We calculated three metrics to explore the two questions: (1) How
did the representations affect risk taking? and (2) How did the representations
7.3 Decision-making under Uncertainty 139
Table 7.9: Percentage of agreeing players in the survey. Green shows the
highest values. Red shows the lowest values.
Percentage (%) of AggrementSurvey Qustion (after each game)
Agreement = Agree, Strongly Agree REP1 REP2 REP3 REP4
1 Familiar 59.1 66.7 61.9 47.6
2 Easy to understand 54.6 66.7 66.7 38.1
3 Visually appealing 22.7 85.3 33.3 38.1
4 Supports decision-making 31.8 54.2 57.1 28.6
Table 7.10: Overview of log data collected in the game.
REP
Average
weighted risk
% of turns ending
with winning/losing
Average money
gained/lost
Mean SD Win Lose ? gain ? loss
REP1 12.8 24.9 24.1% 47.4% 435.3 173.7
REP2 18.7 31.5 17.8% 54.4% 448.1 130.2
REP3 17.3 29.1 23.5% 49.8% 772.2 170.3
REP4 20.3 33.4 15.8% 49.0% 612.4 161.2
affect decision-making? Table 7.10 shows a summary of the results. The metric
“mean weighted risk” calculates the average risk per turn players were willing to
take. The average indicates that players took most risk when using the probability
function and least risk with the textual representation. For the decision-making,
we looked at the “percentage of won turns” and the “mean positive money gain”.
Players won most turns when using the textual representation, however, they on
average won more money when using the bar chart.
Discussion
The textual representation without uncertainty information led to players taking
the lowest risks, but also resulted in the lowest amount of gained money and
the highest amount of lost money. For the line chart and the bar chart, players
took roughly the same amount of risk which is supported by the survey as both
representations were preferred. Nevertheless, players won more turns using the
bar chart and on average gained the highest amount of money. The line chart on
the other hand resulted in the highest losses. The probability distribution function
led to the most risky decisions, but therefore also a higher percentage of lost turns.
The survey supports this as players seemed to rate it as a complex representation.
140 7 Output Methods
On the other hand, players who truly understood the representation were able to
perform very well, which can be seen from the high average of the money gain.
7.3.4 Implications
The survey showed that more uncertainty information is not necessarily better
or perceived as better for decision-making by people. Familiarity, easiness to
understand, and visual appeal are some factors that have to be taken into account
when designing representations for uncertainty communication.
The results of the experiment indicate that people do not favor representations
without uncertainty information, but these representations are a “Low risk, low
reward” option as they maintain high winning rates, but with low average of
won money. Representations with detailed uncertainty information such as the
probability distribution function are a “High risk, high reward” option, as they are
associated with highest winning amounts when understood correctly. However,
they might also cause high losses. Representations with aggregated uncertainty
information were perceived most helpful and understandably by participants.
However, it encouraged them to take incalculable risks. It seems that such rep-
resentations lead to wrong assumptions and overestimation of the amount of
included information. Finally, representations with aggregated detailed uncer-
tainty information seem to offer a good compromise between understandability,
taking educated risks, and achieving to win with high gains.
7.4 Insights for Communicating Uncertainty
In this chapter, we developed and explored output methods for uncertainty in
interactive systems. We focused on the current state of uncertainty communi-
cation in mobile applications, users’ preferences for the future of uncertainty
communication, and how different degrees of uncertainty information included in
representations influence decision-making.
Currently, uncertainty is seldom communicated. If it is communicated, vague
linguistic expressions are used. However, users raised a preference for more
information about uncertainty depending on the context of an application. Never-
theless, a compromise between the preference for more information and the added
cognitive load needs to be found. The main reason is that users still demand the
information to be easily graspable.
7.4 Insights for Communicating Uncertainty 141
Overall, we found that in terms of decision-making, it is not necessarily always
better to show uncertainty information. The choice of representation should
take into account confounding factors such as familiarity, visual appeal, and
easiness to understand. Our classification of how much uncertainty information
is included in a representation can additionally help to detect representations
that might give users a wrong feeling of control and understanding. Based on
our findings, we recommend to use representations including aggregated details
uncertainty information such as a histogram or dot plot.
In the next chapter, we focus on the third important aspect of how users deal
with uncertainty in interactive systems: the interpretation of uncertain data. We
explore how humans make predictions, how interactive systems can support the
aggregation of uncertain data, and the internal models humans use for making
sense of conflicting uncertain information.
142 7 Output Methods
Chapter8
Interpretation
We explored humans’ ability to interpret uncertain data by conducting three
individual research probes. In the first probe, we examined how good humans
are at making predictions themselves and how that influences their personal
understanding. In the second probe, we explored different possibilities to support
humans in comparing uncertain data from multiple sources by using different
aggregation mechanisms. In the third probe, also related to conflicting data, we
aimed to understand humans’ internal models of making sense of conflicting data
from different sources with a different degree of uncertainty information provided.
All probes were evaluated in user studies.
The main goal of this chapter is to better understand how humans interpret
uncertain data and how they can be supported to correctly interpret uncertain
data.
144 8 Interpretation
Parts of this chapter are based on the following publications:
• M. Greis, E. Avci, A. Schmidt, and T. Machulla. Increasing Users’ Confi-
dence in Uncertain Data by Aggregating Data from Multiple Sources. In
Proceedings of the 2017 CHI Conference on Human Factors in Comput-
ing Systems - CHI ’17, pages 828–840, 2017.
• M. Greis, T. Dingler, A. Schmidt, and C. Schmandt. Leveraging User-
made Predictions to Help Understand Personal Behavior Patterns. In
Proceedings of the 19th International Conference on Human-Computer
Interaction with Mobile Devices and Services - MobileHCI ’17, pages
104:1-104:8, 2017.
Parts of this chapter are also planned to be published as follows:
• M. Greis, A. Joshi, A. Schmidt, and Tonja Machulla. Uncertainty Visu-
alization Improves Humans’ Choice of Internal Models for Information
Aggregation.
8.1 Humans Predictions
Smartphone applications and wearable devices count how many steps we take,
how often we unlock our phone, or howmany words we read. Collecting such data
leads to large datasets that are presented to people in order to motivate behavior
change. This is semi-successful and prone to only have short-term effects [Ledger
and McCaffrey, 2014]. One reason for this is the abstract nature of the data which
on its own is insufficient to make users think about their behavior patterns [Choe
et al., 2014].
The main goal of the work presented in this section is to understand whether
user-made predictions can serve as a tool to improve users’ reasoning about
predictions and their understanding of their own behavior patterns. To make
personal predictions, users probably have to think about their behavior more
carefully and their prediction is prone to including uncertainty as they cannot
exactly know their behavior beforehand. We were interested in how people
adapt their predictions and whether they improve their predictions over time. We
decided to focus on mobile phone usage as this is easy to track without using
8.1 Humans Predictions 145
additional sensors than the smartphone. Additionally, current research suggests
that the average usage of applications is below one minute [Böhmer et al., 2011]
and that checking habits lead to longer usage times [Oulasvirta et al., 2012]. We
therefore built an Android application called Predict that tracks the number of
users’ screen-ons and unlocks. Users can additionally predict their behavior for
the current day.
8.1.1 Android Application
We developed the Android application Predict that allows users to predict their
mobile phone screen-on and unlock counts (see Figure 8.1). Each day on their first
mobile phone usage, users received a notification that asked them to predict how
often they would turn on their screen and unlock their phones on this current day
(see Figure 8.1a). As soon as they entered their prediction, the application showed
the predicted values instead as a simple text. As support for their predictions,
users had detailed statistics about their usage from the last day (see Figure
8.1b) and a history diagram of their real usage and predicted values (see Figure
8.1c). To extract the usage behavior, the application counted the Android events
SCREEN_ON, SCREEN_OFF, and USER_PRESENT. The application did not
send any data to an external server unless participants pressed the button in the top
right corner of the application. This functionality was implemented to increase
participants’ trust in the application.
8.1.2 Method
We invited prospective participants to install the application and use it for 14
consecutive days. After this usage period, we sent them a reminder to share their
data with us and provided an online questionnaire. The questionnaire consisted
of two parts. In the first part, participants had to indicate their level of agreement
with ten statements (see Table 8.1) on a five-point Likert scale. In the second
optional part, we asked for qualitative feedback on different aspects such as when,
why and how they predicted, how they felt about making predictions, whether the
predictions influenced their usage behavior, and whether they learned something
during the course of the two weeks.
146 8 Interpretation
(a) Making a prediction (b) History for the last day (c) Complete usage and prediction
history
Figure 8.1: Screenshots of the Predict application that show the three main
screens of the application.
Table 8.1: Statements presented in the online questionnaire.
ID Statement
S1 I liked to make prediction daily.
S2 I always looked forward to see how good my prediction was.
S3 I wanted to improve my prediction every day.
S4 I looked at the historic data before making a new prediction.
S5 I always used the same strategy for making my prediction.
S6 I think that my predictions improved over time.
S7 Making the predictions influenced my usage behavior.
S8 I tried to use my mobile device less.
S9 I will continue to make predictions with the app.
S10 I will continue to look at the historic data with the app.
8.1 Humans Predictions 147
8.1.3 Participants
Twelve participants (10 male, 2 female) with an average age of 27.0 (SD= 11.7)
installed our application. Eight were students and the other four wage earners.
All used the application for at least 14 days, and most continued voluntarily for at
least a few more days.
8.1.4 Results
We registered a total of 9,317 unlock events, 6,576 additional screen-on events (no
unlock performed), and 336 predictions. In general, the number of screen-ons and
unlocks varied highly across participants. They unlocked their phone between
2 and 264 times per day (M = 55.8, SD= 41.7). In addition, they turned on
their screen between 2 and 399 times per day (M = 39.4, SD= 52.3). In the
following, we outline participants’ prediction accuracies and behavior over time.
For prediction accuracy, we calculated the absolute value of the relative error in
percent.
Screen-on & Unlock Predictions. For the screen-on predictions, participants
had an average relative error of 36.7% (SD= 34.1%). For the unlock prediction,
the relative error was even higher with 44.9% (SD= 50.4%). Participants’
predictions, however, improved over time. Comparing the error rate of the first
day and the last day of the study, it decreased for more than 20% for screen-
on predictions (M = 33.2%, SD= 31.3%) and more than 30% (M = 38.1%,
SD= 37.6%) for unlock predictions. Figure 8.2 shows the development of the
average relative errors over time and the resulting regression lines.
Online Questionnaire. We analyzed participants’ Likert scale ratings and pro-
vide the results in Figure 8.3. For the exact statements S1 to S10, see Table 8.1.
To better present the ratings of participants, we converted the ratings to numbers
from 1 for “totally disagree” to 5 for “totally agree”.
Half of the participants liked to make predictions daily, while most of the other
participants were neutral (S1, M = 3.3, SD= 1.0). Participants who liked to
predict their behavior found it interesting and challenging: “At first I thought
it would be annoying, but then I was surprised that it really was interesting
and somehow challenging to better my predictions.” (Male, 22 years). Neutral
participants did not associate any feelings with it: “No special feeling. It was an
ordinary task such as to set the alarm.” (Male, 28 years). Two participants did
148 8 Interpretation
0 5 10 15
0
20
40
60
day
av
er
ag
e
re
la
tiv
e
er
ro
ri
n
%
(a) Regression line for predictions of screen-ons.
0 5 10 15
0
20
40
60
day
av
er
ag
e
re
la
tiv
e
er
ro
ri
n
%
(b) Regression line for predictions of unlocks.
Figure 8.2: Regression lines for the average relative error of screen-on and
unlock predictions.
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10
0
5
10
#
pa
rti
ci
pa
nt
s totally agree
agree
neutral
disagree
totally disagree
Figure 8.3: Results of the online survey showing the agreements of 12
participants on a five-point Likert scale with the statements outlined in Table
8.1.
not like to make predictions: “It got annoying over time. That I wasn’t accurate
at all didn’t help.” (Male, 23 years).
All besides two participants looked forward to see how good their prediction was
(S2,M = 4.0, SD= 0.9). Participants were also eager to improve their predictions
from day to day (S3, M = 4.4, SD= 0.5), however, only seven participants had
the feeling that their predictions actually improved (S6, M = 3.7, SD= 0.9).
We further looked into participants’ prediction strategies. Most of the times, par-
ticipants used the historic data to make their prediction (S4, M = 4.5, SD= 0.7)
and did not change their prediction strategy (S5,M = 3.4, SD= 1.2). Participants
8.1 Humans Predictions 149
started their predictions rather randomly and then developed different strategies,
for example focusing on the average: “I started with pretty random predictions
for the first few days. Then as I saw a kind of average I focused on that. I
guess after a week or so I thought about the day and what is going to happen.
I also always checked for the last day’s accuracy because I thought of it as my
average usage.” (Male, 21 years), recognizing patterns: “I somehow tried to
weigh different factors. First of all I made predictions using the last day. After I
collected some data, I also used some patterns I could recognize (e.g. specific day
of the week or activities).” (Male, 20 years), or identifying influencing factors:
“At the beginning I used yesterday’s historic data and prediction to make today’s
prediction. But that helped only part way because my phone use depends on many
factors - day of week, whether in Boston or travelling, whether I walk or bike to
work, even whether it is raining (don’t want the phone to get wet).” (Male, 63
years).
We asked participants to outline when they predicted and found that three of them
did so in the morning when getting up or unplugging their phone from the charger:
“Morning, when I unplugged the phone. Often just before I leave the house.”
(Male, 63 years). Five participants predicted directly after midnight while three
more specified that they either predicted at midnight or in the morning depending
on how long they stayed awake. Six explicitly referred to predicting when
receiving the notification: “I always predicted before I went to sleep somewhat
after midnight. This was the first time the app reminded me to do so.” (Male, 25
years).
Participants perceived the influence of the application very differently (S7,
M = 2.3, SD= 1.7). One participant, for example, stated that he forgot about the
application until the next day: “I didn’t recognize any influence... after making
the prediction I usually forgot about the app till the next day.” (Male, 23 years).
The four participants that agreed with the statement also agreed that they tried
to use their phone less (S8,M = 2.3, SD= 1.7): “I tried to look less often at my
phone. I have a notification light, so sometimes in the past I would still turn the
screen on to double check if nothing is on. I wouldn’t do that anymore because
for ’predict’ it would be counted. I also tried not to unlock it that often anymore
or not unlock it randomly without having a real purpose apart from being bored.
In general, I would say it supported me very much in being more aware of my
phone usage as finally I had numbers that would back up how often I am using it.”
(Female, 30 years).
Participants learned different things while using the application. One participant
stated that he did not learn anything: “My behavior was quite random so I often
150 8 Interpretation
predicted totally wrong.” (Male, 25 years). Two participants realized that they
used their phone more often than they expected: “That I looked at my device
more than I thought I did.” (Female, 28 years). Five participants learned and
recognized patterns on where, when, and why they used their phone: “I use the
phone a lot less on the weekend, I hadn’t quite realized how much less. How I
travel makes a big difference on phone use. I don’t turn it on as much as I thought
I might.” (Male, 63 years).
We also asked participants whether they would continue making predictions with
the application (S9,M = 3.1, SD= 1.3) and whether they would continue to look
at the historic data of the application (S9, M = 3.3, SD= 1.3).
8.1.5 Discussion & Implications
The results of our study show that participants’ predictions improved over time.
Participants tried to build their own internal model of their usage behavior and
trained this model over time. They used strategies such as focusing on the average,
recognizing patterns, or recognizing external factors to predict their behavior.
Such strategies are also used for predictive algorithms and might be one reason
for uncertain information. The capabilities of humans to make predictions could
be used to support the explanation of predictive algorithms and increase users’
understanding of uncertainty. The learning effect of making predictions could
also be used to give users more trust in predictive algorithms. Participants, for
example, learned that they used their phones much more than they thought before
the study. A predictive algorithm predicting such high numbers with any proof or
involved learning might not be seen as trustworthy.
Participants liked to make predictions and wanted to improve them. The internal
model that they built for predictions also influenced their behavior. User-made
predictions could therefore also be interesting to use in the context of behavior
change. The eagerness of users to predict the right number and to stay in the
range of the prediction could support such changes.
8.2 Aggregating Forecasts from Multiple Sources 151
8.2 Aggregating Forecasts from Multiple
Sources
For many short-term forecasts, users can choose between a large number of
providers. Such forecasts may differ because providers use different models to
create their forecasts. The model uncertainty, however, is seldom presented to
the end-user. In our diary study (see Subsection 4.3.1), participants stated that
they use multiple sources if they do not trust a single source. For example, before
going on a hike, people will consult several weather forecast providers as facing
a thunderstorm in the mountains can be deadly. Comparing multiple forecasts is
mostly tedious and cumbersome as users may open several websites in different
tabs, then try to remember the values of one website to compare it to another.
High-level applications that support easy comparison have recently entered the
market7, but are not heavily in use yet. So far, there is no theoretical underpinning
of how to design for easy comparison of weather forecasts, uncertain data in
general, and the impact of different designs.
The main goal of our work is to understand how to design interfaces that support
the easy comparison of uncertain data from multiple sources. We mainly aim to
understand which designs are useful for users and increase their confidence in
the depicted data. Therefore we compare three aggregation mechanisms: two
mechanisms that computationally aggregate data and one that supports direct
comparison without switching between tabs or applications. Computational and
manual aggregation are likely to differ in terms of mental workload and the
perceived amount of control.
In the following, we present the design rationale for the aggregation mechanisms,
our detailed research questions, an online evaluation of the mechanisms, and
an evaluation in-the-wild. In a hallway questionnaire, we asked six participants
how many forecasts from different sources would be optimal to compare. One
participant found two sources optimal, the other five participants found three
sources optimal. We therefore decided to use three sources for all our designs.
8.2.1 Design Rationale for Aggregation Mechanisms
To reduce the cumbersome switching between different forecast providers and
mental workload when comparing forecasts, two approaches can be followed.
7 Climendo: http://climendo.com/, WeatherXM: http://weatherxm.exm.gr/
152 8 Interpretation
(a) Single source (b) Direct comparison
(c) Range aggregation (d) Mean aggregation
Figure 8.4: Many interactive systems, for example weather applications,
show uncertain data from a single source (see a). To encourage the design of
interactive applications that show uncertain data from multiple sources, we
identified three aggregation mechanisms. The direct comparison allows users
to look at data from multiple sources at the same time introducing mental
workload to aggregate the data. The range and mean aggregation provide
computationally aggregated data, which reduces the mental workload for the
user.
First, the manual comparison could be supported by showing forecasts from
different sources next to each other in a comparable format. Second, the computer
could aggregate data from multiple sources which would reduce the workload
further. The second approach may, however, reduce users’ feeling of control
as they do not necessarily follow decision aids and concrete advice [Joslyn and
LeClerc, 2012]. In the following, we describe the aggregation mechanisms and
the baseline condition in more detail (for an overview see Figure 8.4).
8.2 Aggregating Forecasts from Multiple Sources 153
Single Source
A single source forecast (see Figure 8.4a) corresponds to how weather fore-
casts are currently displayed in most current weather applications. One weather
provider shows exactly one forecast. This mechanism serves as a baseline con-
dition for our study. It introduces the highest workload to users when it comes
to comparing forecasts as they have to open different sources (e.g. websites
or weather applications) and potentially remember the values if they cannot be
displayed next to each other (e.g. due to small screen size on a mobile phone).
Direct Comparison
Direct or manual comparison (see Figure 8.4b) corresponds to showing multiple
single source forecasts next to each other in one system. Thus, the forecasts
can be perceived at one glance and no switching between different providers is
needed. This is similar to the approach used by Frick and Hegg [2011]. The
mechanism reduces the workload for the users as they do not have to remember
the values and saves the time and effort to find different providers. However,
users still have to manually compare and interpret the forecasts. Depending on
the screen size, there could be constraints on how many sources an application
can show on the screen without scrolling.
Range Aggregation
Range aggregation (see Figure 8.4c) computationally aggregates forecasts of
multiple providers, but still enables users to make some decisions on their own.
Morss et al. [2010] already suggested using a range representation for displaying
uncertainty in weather forecasts. In contrast to their work, we do not use values
from the same weather provider but show the minimum and maximum values
from across multiple sources. This makes it easy to spot how much the values
vary between providers without having to compare all single values manually.
However, the mechanism would be affected significantly by one erroneous source.
Mean Aggregation
Mean aggregation (see Figure 8.4d) is identical to a single source forecast except
that a mean of multiple sources is provided. The mechanism aggregates multiple
values into a single value and therefore keeps the simplicity of a single source
forecast, nevertheless it takes into account more information. Depending on the
forecast information, a weighted mean based on the accuracy of sources could
be used. If a mean is used it must be clearly communicated to users as they may
154 8 Interpretation
Table 8.2: Detailed research questions about the design for comparison of
uncertain data.
RQ Research Question
RQ5.1 Does the aggregation of multiple forecasts change the users’ confidence in
uncertain data?
RQ5.2 Do people prefer aggregated forecasts to single source forecasts?
RQ5.3 Do people prefer different aggregation mechanisms (manual vs. computation
aggregation) according to the importance of a scenario?
RQ5.4 Do people prefer different aggregation mechanisms depending on the type
of visual or textual representation?
RQ5.5 Can the theoretical findings be transferred to real world application usage?
otherwise interpret it as a single source forecast. Mean aggregation reduces the
workload, however users might feel a loss of control as they do not have access to
the values of different forecasts.
8.2.2 Detailed Research Questions and Hypotheses
Our main goal is to understand how to design for the comparison of uncertain
data and how this influences users. We broke this goal down to five detailed
research questions (see Table 8.2).
For RQ5.1, we assume that aggregation increases users’ confidence in the pre-
sented data as users will get a better understanding for the uncertainty by com-
paring the sources. For RQ5.2, we assume that users generally prefer aggregated
forecasts to single sources forecasts as aggregation adds information and therefore
leads to more informed decisions. We assume that for RQ5.3, the preference of
aggregation mechanisms depends on the importance of the scenario and context
of users. While in important scenarios users may want more control, they will
happily use aggregated forecasts to reduce workload in less important scenarios.
We additionally assume that for RQ5.4, different representations influence the
preference for aggregation mechanisms as representations may harmonize better
with one or the other aggregation mechanism. RQ5.5 is a more explorative re-
search question, but we assume that the theoretical findings will be transferable
to the real world.
8.2 Aggregating Forecasts from Multiple Sources 155
8.2.3 Online Evaluation
We first evaluated the three aggregation mechanisms and the baseline in an online
survey where we aimed to understand users’ preferences, the influence of the
mechanisms on users’ confidence, and mechanisms’ relationship to different
representations and scenarios of varying importance.
Design
We used a 4 x 4 x 3 within-subject design with three independent variables:
Aggregation mechanism (with the four levels single source, direct comparison,
range, mean), representation (with the four levels text, pictogram, bar, line), and
scenario (with the three levels daily dress code, outdoor BBQ party, outdoor
wedding). We measured participants’ general confidence and their preference
regarding all scenarios on a seven-point Likert item for each of the 16 combina-
tions of aggregation mechanism and representation. For an explanation of the
aggregation mechanisms, see Subsection 8.2.1.
Representations. As current weather forecasts use different representations, we
also included different representations in our survey. To identify suitable repre-
sentations, we investigated weather applications and websites, which mostly use
pictograms with numbers or line charts. We added a text and a bar representation
for more variety. Although the uncertainty in our study is not stochastic, we
treat it as model uncertainty and use the ambiguation methods developed by
Olston and Mackinlay [2002] for the range aggregation in bar and line charts.
All representations show the location of the weather station, a pictogram of the
current weather, and numerical values for the temperature (current temperature,
daily minimum, daily maximum), the chance of rain, and the amount of rain.
Figure 8.5 depicts the four representations for the single source condition.
Scenarios. We assembled a list of twelve different weather-related scenarios:
(1) What to wear when going to work, (2) Whether to take the train or bike
to work, (3) Outdoor soccer viewing event with friends, (4) Camping trip, (5)
Organizing an outdoor wedding, (6) Packing for a trip, (7) Doing outdoor sports
(e.g. swimming, hiking), (8) Planning an outdoor barbecue party, (9) Harvesting
a crop, (10) Gardening, (11) Whale watching, and (12) Attending an outdoor
concert.
To eliminate the time factor, we formulated all of the scenarios to take place
“tomorrow”. In a short hallway questionnaire with 15 participants, we determined
the importance of accurate prediction for the scenarios. Participants had to
156 8 Interpretation
(a) Pictogram (b) Bar graph (c) Line graph (d) Text
Figure 8.5: Exemplary sketches for the representation methods used in the
online survey displaying a single source or mean aggregation forecast.
quickly decide for each scenario whether it was “casual important”, “somewhat
important”, or “very important”. Based on the results, we selected the following
three scenarios with varying importance:
Least importance (10 participants chose “casual important”): Daily Dress
Choice - You are going to work/school tomorrow morning on a regular day
and you need to decide what to wear.
Medium importance (11 participants chose “somewhat important”):
Outdoor BBQ Party - You are planning a BBQ party for tomorrow and
you want to know whether the weather will be good for having a party
outdoors.
Highest importance (13 participants chose “very important”): Outdoor
Wedding - You have your wedding tomorrow and it is an outdoor wedding.
You need to decide whether you should make changes in the organization
(like arranging a tent, or moving the wedding indoors).
Method
At the beginning of the survey, we collected demographic data and details about
the habitual usage of weather forecasts. Participants then navigated through four
pages; one for each representation. The order of representations was random-
ized across participants to reduce sequence effects. On each page, participants
8.2 Aggregating Forecasts from Multiple Sources 157
encountered an explanation of the representation, all aggregation methods and
all scenarios. Additionally, each page contained four sketches, which showed
the representation in combination with each of the four aggregation mechanisms.
For each sketch, we asked participants to rate their confidence in the depicted
weather forecast on a seven-point Likert item (“I feel confident that this will
be tomorrow’s weather.”) ranging from “completely disagree” to “completely
agree”. Participants were aware that the shown forecast did not veridically display
the weather of the next day. Subsequently, we asked participants to rate their
preference for each sketch on three seven-point Likert items, each for using the
depicted interface in one of the three possible scenarios (e.g., “I would like to use
this representation for the scenario: Daily Dress Choice”).
Participants
71 participants (37 male, 33 female) between an age of 19 and 77 years (M = 24.8,
SD= 7.0) fully completed our online survey. We recruited the participants via
mailing lists and social media channels of the University. The majority of our
participants had completed a university degree (59.5%) or a high school degree
(26.8%). At the end of the survey, participants had the possibility to enter a raffle
of two 20e Amazon vouchers.
70.4% of our participants consulted weather forecasts multiple times a week.
Their usage behavior changed according to season (“Especially in summer be-
cause the weather changes very often.”), events (“I always check it before going to
swim outside, and a barbecue, or a festival (everything outside where it could be
fatal to wear the wrong clothes or the event is not appropriate for rainy weather).”),
specific weather conditions (“Especially when the weather doesn’t seem to be
stable”), travel plans, and location.
53.1% of our participants used more than two weather providers regularly
(M = 1.68, SD= 0.92). Participants selected weather providers based on the
perceived reliability and accuracy, usability, fast and easy access, and amount of
information. Participants also stopped using weather providers because they ei-
ther found the forecast inaccurate or the forecast was not easily available. Several
reasons for comparing different sources were mentioned: to find out whether the
sources provided consistent information (“If I really need to know the weather,
I look at all of them to see if they match.”), to compute the average (“I find
they never agree with one another so I always try to find multiple sources and
average them out (informally).”), or estimate the worst possible scenario (“I
usually estimate the worst possible weather for a given scenario and prepare
accordingly.”).
158 8 Interpretation
Bar Line Pictogram Text
Representation
4
4.5
5
5.5
6
Co
nf
id
en
ce
 ra
tin
gs
direct
mean
range
single
(a) Confidence ratings for the aggregation mechanisms
as a function of representation
Bar Line Pictogram Text
Representation
4
4.5
5
5.5
6
Pr
ef
er
en
ce
 ra
tin
gs
direct
mean
range
single
(b) Preference ratings for the aggregation mechanisms
as a function of representation
Daily BBQ Wedding
Scenario
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
Ce
nt
er
ed
 p
re
fe
re
nc
e 
ra
tin
gs
direct
mean
range
single
(c) Preference ratings for the aggregation mechanisms
as a function of scenario
Figure 8.6: Participants’ mean preference and confidence ratings (measured
on a seven-point Likert-type item with 1 corresponding to completely disagree
and 7 corresponding to completely agree); error bars represent standard errors
of the mean. Figure 8.6c depicts centered mean ratings, i.e., the main effect
of scenario has been removed to improve the perceptibility of the interaction
effect.
Results
We performed two linear mixed-effects models analyses on the aligned-rank
transformed Likert item data [Wobbrock et al., 2011] for participants’ preference
ratings and participants’ confidence ratings. For the preference ratings, we used
the fixed factors aggregation method, representation, and scenario as well as
the random factor participant ID. For the confidence ratings, we used the same
factors except for the factor scenario as we did not record participants’ confidence
on a scenario level. For significant effects, we conducted post hoc pair-wise
comparisons with Bonferroni corrections.
8.2 Aggregating Forecasts from Multiple Sources 159
Aggregation methods. Figure 8.6a shows participants’ mean confidence ratings
for each of the four aggregation methods as a function of the representations.
Mean ratings were highest for range (M = 5.34,SD= 1.46), followed by direct
(M = 4.90,SD= 1.75), mean (M = 4.83,SD= 1.6), and single (M = 4.79,SD=
1.5). The mixed-effects analysis revealed a main effect of aggregation method on
participants’ confidence ratings (F(3,1050) = 13.5, p< .001). In particular, pair-
wise comparisons showed that participants indicated higher confidence in range
compared to the other three methods (range vs. direct: t(1050) = 3.48, p< .01;
range vs. mean: t(1050) = 4.85, p< .001; range vs. single: t(1050) = 6.0, p<
.001).
Range received the highest mean preference rating (M = 4.85,SD= 1.73), fol-
lowed by direct (M = 4.51,SD= 1.9), mean (M = 4.22,SD= 1.82), and single
(M= 4.13,SD= 1.72). Figure 8.6b shows participants’ overall preference ratings
for each of the four aggregation methods as a function of the representation. The
mixed-effects analysis revealed a main effect of the aggregation method on partic-
ipants’ preference ratings (F(3,3290) = 37.78, p< .001). Pair-wise comparisons
showed that participants preferred range over the other three methods (range
vs. direct: t(3290) = 5.55, p< .001; range vs. mean: t(3290) = 7.69, p< .001;
range vs. single: t(3290) = 10.21, p< .001), as well as direct over single (direct
vs. single: t(3290) = 4.66, p< .001).
These results support our hypothesis for RQ5.1 that people place higher confi-
dence in aggregated data, especially if a range aggregation is used. Additionally,
the results support our hypothesis for RQ5.2 that people generally prefer aggre-
gated forecasts to single source forecasts.
Representations. Line received the highest mean preference rating (M =
4.73,SD = 1.67), followed by pictogram (M = 4.70,SD = 1.73), bar (M =
4.42,SD = 1.81), and text (M = 4.28,SD = 1.88). The mixed-effects analy-
sis revealed a main effect of representation on participants’ preference rank-
ings (F(3,3290) = 16.48, p < .001). Pair-wise comparisons indicated that
participants preferred line and pictogram over bar and text (line vs. bar:
t(3290) = 4.17, p< .001; line vs. text: t(3290) = 5.92, p< .001; pictogram vs.
bar: t(3290) = 3.682, p< .001; pictogram vs. text: t(3290) = 5.436, p< .001).
Mean confidence ratings were highest for pictogram (M = 5.17,SD = 1.53),
followed by line (M = 5.09,SD = 1.49), text (M = 4.83,SD = 1.67), and bar
(M= 4.70,SD= 1.69). The linear mixed effects analysis revealed a main effect of
representation (F(3,1050) = 7.27, p< .001). Pair-wise comparisons showed that
pictogram received higher confidence ratings than bar (t(1050) = 4.08, p< .001)
and text (t(1050) = 3.8, p< .001).
160 8 Interpretation
These results indicate that participants preferred the representations line and
pictogram. The latter also receives a confidence bonus.
Interactions. Of the three possible two-way interactions and the one three-way
interaction between the three factors in the three-way analysis of preference rat-
ings, only the interaction between aggregation method and scenario is significant
(F(6,3290) = 11.71, p< .001). Figure 8.6c shows the mean preferences for each
of the four aggregation methods as a function of the scenario. For illustration
purposes, the data of each scenario was centered on the scenario’s overall mean.
As the figure illustrates, a main source of the interaction effect is the increase
in the preference ratings for the direct aggregation method as the importance of
the scenario increases. Five of nine pairwise comparisons involving the direct
aggregation method show significant differences of the difference of participants’
preference ratings between the direct aggregation method and one of the other
methods across two levels of the factor scenario (e.g., for the daily scenario single
presentation is preferred over the direct aggregation while the situation is reversed
for the wedding scenario; see Table 8.3 for test statistics and p-values for all
significant contrasts). This partially confirms our hypothesis for RQ5.3 that the
willingness to perform aggregation manually increases with the importance of
the scenario (though range aggregation remains the overall preferred method).
A secondary source of interaction results from the stronger decrease in preference
of the single source as compared to the other aggregation methods as importance
of scenario increases (e.g. the difference between single and range is larger for
the wedding scenario than for the daily scenario).
We find no interaction effect between aggregation methods and representations.
In other words, there is not sufficient evidence in favor of our hypothesis for
RQ5.4 that users have different preferences regarding the aggregation method
depending on which representation is provided. Rather, users show an overall
preference for range.
Qualitative Feedback. We additionally collected some qualitative feedback
about the aggregation mechanisms. People mostly stated that the single source
(especially the pictogram representation) “seems like enough information for
daily dress choice, and it is a similar representation to what [they] usually use.
However, [they] would like a more detailed forecast for event planning.”
As expected, participants preferred the direct comparison as they felt able to make
an informed decision: “I like having control and letting me make the decisions. I
am informed of all possibilities, and it is up to me to come to a decision.”
8.2 Aggregating Forecasts from Multiple Sources 161
Table 8.3: Difference of the differences between two aggregation methods
across two scenarios (e.g., is the difference between direct and single larger
in the BBQ or in the Daily scenario?).
Contrast df c2 p-value
Direct-Single BBQ-Daily 1 22.47 < .001
Range-Single BBQ-Daily 1 10.35 .023
Direct-Single BBQ-Wedding 1 11.37 .013
Direct-Mean Daily-Wedding 1 25.24 < .001
Direct-Range Daily-Wedding 1 10.23 .025
Direct-Single Daily-Wedding 1 65.81 < .001
Mean-Single Daily-Wedding 1 9.54 .036
Range-Single Daily-Wedding 1 24.15 < .001
Participants also positively expressed that they liked the range aggregation: “I
like this representation because it provides almost as much information as the
side-by-side comparison, but is much easier to see and interpret. It seems more
trustworthy because it provides a range rather than a single value for temperature
and rainfall. This acts as a margin of error, which means it is more likely to be
correct.”
For the mean aggregation, most participants stated that it “does not represent the
variation in forecasts between different weather providers, and it also (probably)
does not provide the exact values reported by any one provider [and it] seems
untrustworthy.” In contrast to this, one participants also stated that he did not
“see the different temperatures and so on, but [he] know[s] that it’s an average of
some forecast sources, that’s why [he] trust[s] the forecast.”
Discussion
Participants generally preferred aggregation and their perceived confidence in-
creased as well. However, the mean aggregation did not receive significantly
better ratings than a single source forecast. Participants rather preferred range
aggregation and direct comparison, which leave more room for an informed
decision and provide more transparency on the actual value of the sources. The
sheer knowledge of having multiple sources without knowing their actual values
does not help to increase confidence.
Regarding the representations, participants preferred the line and pictogram
representation. We assume that they were preferred due to users’ familiarity with
162 8 Interpretation
these representations as these are commonly used representations for weather
data.
We did not find any interaction effect between aggregation mechanism and repre-
sentation. This indicates that users do not prefer different aggregation mechanisms
for different representations, but instead always prefer the range aggregation.
Regarding our second evaluation, this allowed us to choose the aggregation
mechanisms and the representation according to the overall preferences.
The interaction effect between the aggregation mechanism and the scenario
indicates that with increasing importance of the scenario, users show a slightly
increased preference for range and an increased preference for direct comparison,
while the preference for single source forecasts drops below all other mechanisms.
In important scenarios, people seem to prefer to have more control over their
decisions by having more information and more work to aggregate multiple
sources. Interestingly, this also applies for the range aggregation which seems to
be a good compromise providing enough details to foster a sense of control.
8.2.4 Evaluation in the Wild
Based on the previous results, we developed an Android weather application to
evaluate the most promising aggregation mechanisms in the wild.
Method
We developed the weather application “Weather Compare” (see Figure 8.7). Based
on the results of the online survey, we decided to show a pictogram representation
and support range aggregation (see Figure 8.7a) and direct comparison (see
Figure 8.7b). The application allows a user to toggle between the two aggregation
mechanisms on demand, however, range aggregation is the default view. We
picked three weather APIs that allow a reasonable number of API calls and are
free to use for non-commercial/research purposes. Our application required users
to turn on their location settings as the application automatically fetched the
location of the user to show the forecast. We based the design of the application
on the sketches shown in the user study, but added hourly information to make
the application more informative.
All participants received detailed instructions on how the application works and an
apk file to install the application on their personal mobile phones. We asked them
to use the application for one week instead of their normal weather application.
8.2 Aggregating Forecasts from Multiple Sources 163
(a) App - Range aggregation (b) App - Direct comparison
Figure 8.7: Screenshots of the weather application “Weather Compare” with
the main screen showing the range aggregation and the details screen showing
the direct comparison of in total 3 different weather providers.
We logged participants’ usage of the application and collected qualitative feedback
after the course of the week on the (1) Future usage of the application, (2) Opinion
about having data from multiple sources, (3) Usage of range vs. details view, and
(4) Influence of aggregation on confidence.
Participants
We recruited 23 participants (12 male, 10 female, 1 preferred not to say) with
an average age of 30.0 (SD= 11.3), who regularly use weather applications. We
invited them by sharing the news about the application via social media channels
of the university.
Results
During the week of using the application, participants opened the application on
average 18.0 times (SD= 5.7). The details view was used in approximately one-
164 8 Interpretation
third of cases (M = 6.1,SD= 2.7). In the following, we present the qualitative
feedback of participants.
Future usage of “Weather Compare”. All except three participants stated that
they would like to use “Weather Compare” in the future. They, however, had
specific feature requests such as including a wind forecast or a longer forecast
range.
Opinion about having data from multiple sources. 14 participants explicitly
stated that having data from multiple sources “is the main thing that [they] liked
about the app”. The aggregation helped them to better judge the forecast: “I liked
it because since this is a forecast, one provider is not really reliable but when you
see that 3 different providers agree on something, it is more convincing.” Only
one participant did not appreciate the aggregation and stated “that the cognitive
load is too high when making the effort and actually comparing the values in
contrast to just looking at a yellow sun or grey cloud.”
Usage of range aggregation vs. direct comparison. Six participants stated
that they “always used the range representation and used the detailed view only
out of curiosity.” In contrast, three participants only used the details view. As
one participant explained: “I usually used the details view every day, because
normally I would check at least 2 different sources to be sure about the weather,
thus, I would like to know what each source says rather than seeing an average
of them.” One participant regularly used both views. Six other participants stated
that if “the range of values [was] too wide, [they] checked the details page.” or
“before [they] spent a longer time outside”.
Influence of aggregation on confidence. Twelve participants stated that the ap-
plication increased their confidence as they felt that the forecast was more detailed
and having multiple sources made them trust it more. Only one participant stated
that “the range was typically only 2-3 degrees, in which case it does not influence
[his] decision in any way.” One participant did not think that having forecasts
from multiple sources increased her confidence, “because when they differ [she]
kind of feel[s] like [she] lose[s] trust in all of them.”
Discussion
With the help of the qualitative feedback, we were able to identify some short-
comings of “Weather Compare” arising from the fact that the APIs provided only
a limited amount of information that we were able to access and show in the
application. Participants missed certain features (e.g. typing in a location) or
certain information (e.g. wind forecast) familiar from other applications. We
8.2 Aggregating Forecasts from Multiple Sources 165
assume that these shortcomings could be easily overcome by allowing users to
individualize the application and offer more settings.
Regarding the APIs that we used, most participants had no strong opinion on what
sources should be shown in the application. However, one participant explicitly
stated that he wanted to choose the sources in the application. Providing more
information about the used APIs and even a choice of providers for users could
help them to understand constraints and increase their trust in the application.
Surprisingly, nine participants exclusively used one view instead of switching
between views depending on importance, which we had expected to happen
according to the results of the online survey. We assume that the weather forecast
in general was of different importance to participants and that some did not want
to give control to the application. In particular, participants who already compared
forecasts before usingWeather Compare commented highly positively about it.
In line with these findings, most participants stated that the aggregation of fore-
casts increased their confidence in the forecast. Interestingly, one participant
felt that the application decreased her trust in weather forecasts in general. It
seems that although research suggests otherwise, people might not be aware of the
uncertainty in weather forecasts or perceive it lower than the actual uncertainty.
It would be interesting to investigate whether this opinion changes after using the
application long-term or when switching back to the weather application used
before the study (as this is now maybe also perceived to be more uncertain).
8.2.5 Implications
Based on the results of our online survey and the in-the-wild evaluation, we
derived five design implications for supporting the easy comparison of uncertain
data. The implications reach beyond weather data and can serve as a starting
point to develop applications supporting comparison of uncertain data in other
application areas.
Support contexts with different importance: Applications should support
range aggregation and direct comparison to allow the adaption to con-
texts with different importance. Although participants prefer aggregated
data, they still want to make an informed decision and own judgements in
very important scenarios.
166 8 Interpretation
Support perception at a glance: In everyday usage, users do not want to spend
too much time in processing information and need a method to quickly
make a decision by perceiving important information at a glance.
Support different types of users: Applications supporting multiple aggrega-
tion mechanisms should make switching between those easy and give
people a choice to set their preferred default option. The evaluation in the
wild showed that people have different tolerances for giving the control of
aggregation to an application.
Give opportunities for choice: Applications should offer users a list of many
providers to choose from, instead of pre-selecting the providers. This
furthermore increases the feeling of control and potentially also trust.
Establish transparency: During the in-the-wild evaluation, participants re-
quested additional forecast data (e.g. a wind forecast). By establishing
transparency in providing detailed information on what information the
providers actually offer, users would better understand the requirements
and constraints of the application.
8.3 Humans’ Internal Models for Aggregat-
ing Conflicting Uncertain Measurements
Wearable devices such as smartphones, smart watches, and activity trackers
automatically contain a growing number of sensors. These sensors such as the
accelerometer are often very small low-cost sensors to fit in the limited physical
space. Additionally, specialized devices exist that are tailored to only track
one specific property. All measured sensor data is uncertain as there might be
measurement errors, the sensors might be wrongly calibrated, or the algorithms
use thresholds to determine sensor values. As the number of personal general and
specialized devices increases, the amount of conflicting and confusing information
likewise increases.
The main goal of our work is to understand how humans aggregate conflicting
probabilistic data. There are several mathematical ways of aggregating prob-
abilistic data such as a simple average, a maximum likelihood estimator, or a
winner-takes-all model [Ernst and Banks, 2002]. Budescu [2006] found that
humans aggregate conflicting information from different sources by building an
average. Their confidence in the aggregation depends on structural and natural
8.3 Humans’ Internal Models for Aggregating Conflicting
Uncertain Measurements 167
Table 8.4: Detailed research questions about how people interpret conflicting
probabilistic information.
RQ Research Question
RQ5.6 Does uncertainty information in general change how humans aggregate data?
RQ5.7 Do people make statistically more optimal decisions with different visualiza-
tions?
RQ5.8 Do people take the reliability of sensors into account?
RQ5.9 Do people make different decisions depending on different distances between
measurements?
factors such as the amount of information, the number of experts judging the
information, the inaccuracy, and the overlap of the information. However, it is
unclear how the amount of presented uncertainty information influences humans’
reasoning and their internal models. In the following, we present our detailed re-
search questions and a user study investigating how humans aggregate conflicting
sensor data.
8.3.1 Detailed Research Questions and Hypotheses
Our main goal is to understand how people interpret conflicting probabilistic
information. We broke this goal down into five detailed research questions (see
Table 8.4 for RQ5.6 to RQ5.9).
For RQ5.6, we assume that humans select the average if they have two point
estimates, but use a different internal model if uncertainty information is provided.
For RQ5.7, we assume that people do not make statistically optimal decisions.
We, however, expect that people make statistically more optimal decisions if
they have more uncertainty information. We further assume that a representation
showing detailed aggregated uncertainty information as outlined in Section 7.3.1
(especially see Table 7.6), will lead to the statistically most optimal decisions.
We assume that RQ5.8 is true and that humans adjust their aggregation towards
the more reliable value, but will select the average if both values are equally
likely. For RQ5.9, we assume that humans might change their strategy if sensor
measurements differ too much.
168 8 Interpretation
8.3.2 User Study
In a user study, we evaluated four different representations combined with dif-
ferent weightings and different distances between estimates. We calculated the
weightings based on a formula for calculating the weighted average of two mea-
surements [Taylor, 1997]:
Xest. =
wA MeasurementA+wB MeasurementB
wA+wB
(8.1)
where the weight of each measurement is determined by the normalized reciprocal
variance of the measurement:
wA =
1
VarA
and wB =
1
VarB
(8.2)
Design
We used a 4 x 5 x 3 within-subject design with three independent variables:
visualization (with the four levels point estimate, confidence interval, dotplot,
probability distribution function), weighting (with the five levels 50%-50%,
40%-60%, 30%-70%, 20%-80%, 10%-90%), and distance (with the three
levels no distance, small distance, large distance). Besides the details about the
current condition, we only logged participants’ answers.
Visualizations. We used four different visualizations based on the amount of
uncertainty included in a representation defined in Section 7.3.1. We only used
graphical representations to increase their comparability. We used a point estimate
without uncertainty information, a confidence interval with aggregated uncertainty
information, a dotplot (as suggested by Kay et al. [2016]) for detailed aggregated
uncertainty and a probability distribution function for detailed uncertainty.
Weighting. We chose the associated variances for the measurements such that
one weight increased in steps of 0.1 from 0.5 to 0.9 and the other weight decreased
from 0.5 to 0.1 (for both weights after normalizing). In the following, we refer to
these weights as 50%-50%, 40%-60%, 30%-70%, 20%-80%, and 10%-90%.
Distance. We used zero distance and two different distances between the shown
measurements: a small and a large distance (see Figure 8.9). The small and large
distances equaled on average 1.9 and 5 times the size of the standard deviation of
the distribution used in the 50%-50% weighting condition.
8.3 Humans’ Internal Models for Aggregating Conflicting
Uncertain Measurements 169
(a) Point estimate (b) Confidence interval (c) Dotplot (d) Probability distribution
function
Figure 8.8: The four visualizations used in the user study each including a
different amount of uncertainty information as defined in Section 7.3.1.
(a) Zero distance (b) Small distance (c) Large distance
Figure 8.9: We used three different distances between measurements in our
experiment. This figure shows an example for each distance.
Method
At the beginning of the study, participants had to fill a demographic questionnaire
before the study instructor explained the general task. For each of the four
visualizations, participants were asked to select the true value on a slider that
was shown in the middle of two visualizations showing different values. We
used a Latin square design to determine the order of the visualizations for each
participant. Before answering the first task with a new visualization, participants
got a short explanation of the visualization and a trial period to get acclimated to
the visualization and the interface. We additionally randomized the placement of
the visualizations on the page (top-bottom and left-right). Participants experienced
each weighting 40 times per visualization, which resulted in a total amount of 200
trials per visualization and 800 trials overall. At the end of the study, participants
filled a Berlin Numeracy Test [Cokely et al., 2012] to assess their statistical
knowledge.
170 8 Interpretation
Participants
16 participants (8 male, 8 female), with an average age of 24.6 (SD= 3.4) par-
ticipated in the study. Most of the participants were students as participants
were recruited in a university setting. The Berlin Numeracy Score of participants
ranged from 3 to 7, showing a basic but not necessarily an expert understanding
of statistics.
Results
Based on participants’ answers, we calculated the weights that they assigned to
the two presented measurements and calculated the difference to the actual weight
that we used to generate the visualizations. For each trial, we used the weight
difference of the larger weighting for the analysis. For example, if a participant
saw two measurements with a weighting of 20%-80% and instead internally used
a weighting of 30%-70%, the weight difference would be 10% or 0.1 for the
larger weight. We performed a linear mixed-effects model analysis on the aligned-
rank transformed data [Wobbrock et al., 2011] for the weighting differences as
the data was not normally-distributed. We used the fixed factors visualization,
weighting, and distance as well as a random factor for the participant ID. For
significant effects, we conducted post hoc pair-wise comparisons with Bonferroni
corrections.
The mixed effect analysis revealed a main effect of the visualization, the weight-
ing, and the distance on the weighting difference. However, all two-way interac-
tions and the three-way interactions are as well significant which might impact
the main effects (see Table 8.5 for details).
Visualization. The post hoc test revealed that the weight differences of all vi-
sualizations differ significantly except the difference between the dotplot and
the probability distribution function. The average weight difference was high-
est for the point estimate (M = 0.17, SD= 0.18), followed by the confidence
interval (M = 0.06, SD= 0.24), the probability distribution function (M = 0.05,
SD= 0.26), and the dotplot (M = 0.04, SD= 0.24). These results support the
hypothesis that uncertainty information changes how people in general choose the
true value. We can additionally partially support our hypothesis that people make
statistically more optimal aggregations with uncertainty information, however,
the dotplot was not significantly better than the probability distribution function.
Weighting. The post hoc test revealed that the weight differences between
all weightings differed significantly. The weight difference was smallest for
8.3 Humans’ Internal Models for Aggregating Conflicting
Uncertain Measurements 171
Table 8.5: Results of the linear mixed-effects model analysis on the aligned-
rank transformed data [Wobbrock et al., 2011] for the three main factors
visualization, weighting, and distance, and the random factor participant ID.
Effect/Interaction df df.res F p-value
Visualization 3 12113 1201.969 < .001
Weighting 4 12113 692.652 < .001
Distance 2 12113 787.455 < .001
Visualization:Weighting 12 12113 170.183 < .001
Visualization:Distance 6 12113 215.389 < .001
Weighting:Distance 8 12113 175.496 < .001
Visualization:Weighting:Distance 24 12113 54.775 < .001
the 50%-50% weighting (M = 0.01, SD= 0.19), followed by the 40%-
60% weighting (M = 0.04, SD= 0.22), followed by the 30%-70% weight-
ing (M = 0.07, SD= 0.23), followed by the 20%-80% weighting (M = 0.12,
SD= 0.25), with the highest weight difference for the 10%-90% weighting
(M = 0.18, SD= 0.26). The higher the weighting, the bigger the difference of
participants to statistically optimal judgements. These results support our hypoth-
esis that people selected the average for the 50%-50% weighting and adjusted
the value towards the more reliable value for different weightings.
Distance. The post hoc test revealed that the weight differences between all
distances differ significantly. The weight difference was smallest for the zero dis-
tance condition(M = 0.00, SD= 0.00), followed by the large distance (M = 0.09,
SD= 0.26) condition. The weight difference was highest for the small distance
(M = 0.10, SD= 0.25) condition. These findings support our hypothesis that
people select the average for no distance between values, however, the distance
has an unexpected influence on the selection of the true value.
Interaction between Visualization and Weighting. Figure 8.10a shows the
mean weight differences for each of the four visualizations as a function of the
weighting. The post hoc test revealed that the differences of differences between
the point estimate and all other visualizations are significant. For the less equal
weightings (20%-80%, 10%-90%), the difference between the dotplot and the
probability distribution function to the confidence interval is significant. This
does not contradict the main effects.
Interaction between Visualization and Distance. Figure 8.10b shows the mean
weight differences for each of the three distances as a function of the visualization.
172 8 Interpretation
All differences are significantly different expect from the difference between the
zero distance and the large distance condition for the dotplot and the probability
distribution function. However, the difference between the dotplot and the proba-
bility distribution function differs significantly between small and large distance.
This does not contradict the main effects, but provides an explanation for the
difference between small and large distance.
Interaction between Weighting and Distance. Figure 8.10c shows the mean
weight differences for each of the three distances as a function of the weighting.
The post hoc test revealed that the differences of differences between the zero
distance condition and the small or large distance conditions are all significant.
For the most equal and most extreme weightings (50%-50%, 10%-90%), there
is no significant difference between the small and the large distance condition.
This also does not contradict the main effects, but provides further insights in the
difference between small and large distance.
Three-way Interaction. Figure 8.11 shows the three distances as a function of
the weighting separately for each of the four visualizations. These diagrams most
clearly underline the following finding:
1. With information about sensor reliability (see Figure 8.11a), participants
chose the average value,
2. The higher the difference in the sensor reliabilities, the higher participant’s
estimation error,
3. Uncertainty information lowers the increase of the error,
4. For larger differences in sensor reliabilities, dotplot and probability distri-
bution are most suited, and
5. The error is larger for small inconsistencies, in particular for the probability
distribution function.
8.3.3 Discussion & Implications
Our results show that displaying uncertainty information changes humans’ in-
ternal models of how they aggregate two distinct values. As expected, humans
choose the average if they have no uncertainty information, but weight the values
as soon as they have uncertainty information. More detailed information in form
of a dotplot or a probability function makes the weighting more optimal than
showing a confidence interval.
8.3 Humans’ Internal Models for Aggregating Conflicting
Uncertain Measurements 173
(a) Mean weight differences for each of the four visualizations as a function of the weighting.
(b) Mean weight differences for each of the three distances as a function of the visualization.
(c) Mean weight difference for each of the three distances as a function of the weighting.
Figure 8.10: Participants’ mean weight differences for combinations of two
variables; error bars represent the standard errors of the mean.
174 8 Interpretation
(a) Mean weight differences for the point estimate for
each of the three distances as a function of the weight-
ing.
(b) Mean weight differences for the confidence interval
for each of the three distances as a function of the visu-
alization.
(c) Mean weight difference for the dotplot for each of
the three distances as a function of the weighting.
(d) Mean weight difference for the probability distribu-
tion function for each of the three distances as a function
of the weighting.
Figure 8.11: Participants’ mean weight differences plotted for each visual-
ization; error bars represent the standard errors of the mean.
For the different weightings, our results show that the higher the deviation between
the two weightings, the bigger the error to the statistically optimal judgement.
People are more conservative and do not choose extreme weightings such as
10%-90%. In combination with the visualizations, we can conclude that for
nearly the same reliability, showing uncertainty information makes no sense as
there is no advantage of showing this information and point estimates work well to
calculate an average. The larger the difference between the reliability of sensors,
the more participants benefitted from additional uncertainty information to make
statistically more correct judgements. The error of the weighting grew linearly
with the point estimate and also for the confidence interval, we can see a linear
trend. The confidence interval, however, might still be applicable for cases with a
weighting between 40%-60% and 30%-70%.
For the distance, we found that for zero distance between point estimates, par-
ticipants always selected the average no matter what visualization or weighting
was used. Here, a point estimate can be enough to communicate the information
as it will produce the same result. For the small and the large distance condition,
8.4 Insights for Supporting the Interpretation 175
we, however, experienced that participants made larger errors with the smaller
distance. We assume that for smaller distances, people were biased towards the av-
erage while this bias was not in place for larger distances. It is also interesting that
the probability distribution function seems to produce larger errors in comparison
to the dotplot for the smaller bias. In Figure 8.11c it becomes clear why there is
this difference. As the dotplot performed slightly better for some weightings with
the small distance and for others with the large distance, the effect cancels out for
the two-way interaction. However, the values indicate that the difference between
the small distance and the large distance rises with the additional uncertainty
information. This might be a slight indicator that the dotplot should be preferred
to a probability distribution function for communicating uncertain information.
8.4 Insights for Supporting the Interpretation
In this chapter, we took a closer look at the interpretation of uncertain data through
the user. We mainly explored humans’ internal models when making predictions,
and when aggregating conflicting information.
We found that user-made predictions are a suitable tool to support humans in
understanding their behavior patterns and learn how such predictions can be
improved. We argue that these insights could be an interesting start to use user-
made predictions to make complex algorithms predicting user behavior easier to
understand as they may use the same principles to make their predictions.
For the interpretation of conflicting uncertain data, we identified five design
implications to design for comparison. The main findings are that differences
in users’ character and their judgement of scenarios play an important role as
identified in Chapter 5. Additionally, transparency of information sources needs
to be established to foster trust.
We further found that uncertainty information improves users’ aggregation of
conflicting information if uncertain data from multiple sources is presented. Par-
ticipants use a weighted average although they do not make a statistically optimal
aggregation. The larger the difference between the reliability of uncertain data, the
more uncertainty information helped participants to make better aggregations. For
non-conflicting information, point estimates are a suitable way of visualization
the data.
Based on all our previous findings, we present our conclusions in the next part
of this thesis. We created a web-based simulation tool for end-users that allows
176 8 Interpretation
researchers to experiment with the input and output methods developed in the
context of this thesis. The use case of end-user simulations is used as a sample
use case for uncertainty communication in interactive systems.
IV
CONCLUSION AND
FUTURE WORK

Chapter9
Simulation Tool for End-Users
We implemented a web-based simulation tool called SimulaTE including input
methods presented in Section 6.1 and Section 6.2 as well as output methods
presented in Section 7.3. The tool incorporates these input and output methods for
uncertainty to allow researchers and the general public to explore different input
and output methods for calculations and small simulations. In this chapter, we
summarize the functionality of the simulation tool and the user-centered process
we followed during its development.
9.1 Implementation
The simulation tool was built with Django 1.8.68; a python-based web framework
and Bootstrap9 as a front-end framework. The tool is mainly divided into two
distinct parts: the toolbox to build modules and models, and the simulation part
to run models. Users need to sign up or log in to be able to access the toolbox
and the simulations.
An overview of the general architecture of SimulaTE is provided in Figure 9.1.
The module creator allows users to create their own modules, which are stored in
8 Django: https://www.djangoproject.com/start/overview/
9 Bootstrap: http://getbootstrap.com/
180 9 Simulation Tool for End-Users
Figure 9.1: Overview of the three main components of SimulaTE and their
connections.
a database and used to create models. The model creator allows users to create
models. As models consist of modules, the model creator fetches the existing
modules from the database. For each model, a Python file is created and some
additional information is stored in the database. The simulation component runs
the Python file for given inputs and fetches some additional model information
from the database. The results of all simulation runs are also stored in the
database.
9.2 User-Centered Process
To develop the end-user simulation tool, we used a user-centered process. We
incorporated multiple simulation experts from the Cluster of Excellence in Sim-
ulation Technology in our design process by conducting interviews with them
in different steps of the development process. All of the experts were from the
Social Sciences and without any programming knowledge. We incorporated their
feedback directly into our implementation by adapting the interface to make it
easier for non-programmers to use.
9.3 Functionality 181
We further conducted a small usability study with the first functional prototype
of the tool. We ran a pilot study with one participant to make sure that our task
descriptions were understandable and adapted them based on the feedback. We
then recruited four participants (3 male, 1 female). On a scale from 1 to 5 where
5 corresponds to an expert programmer, participants reported their programming
experience to be at the lower end (M = 2.0,SD= 0.82). Participants had to solve
tasks by using the simulation tool. The tasks included sign up, creating an easy
module, creating a more complex model for an existing module, and running
simulations. We did screen and audio recording and asked the participants to
think aloud. One study instructor documented their actions during the study to
identify potential usability flaws. The participants mainly struggled with the
parameters of the inputs and outputs, so we adapted the respective parts based on
their feedback to improve the overall usability of the simulation tool.
We implemented several modules and models as examples: First, we implemented
some mathematical formulas such as the quadratic formula. Second, we imple-
mented small models calculating the interest on a sum of money over time and the
ecological footprint for different means of transportation. As a proof of concept,
we additionally implemented an import for Fitbit10 data and an import for a food
database to build models related to health and calorie intake.
9.3 Functionality
SimulaTE consists of a number of different pages. Besides the main page, which
contains an explanation of the whole tool and of the process to create and run
simulations on the platform (see Figure B.1), the tool contains an overview of all
available input methods, all available output methods, a module creator, a model
creator, and a simulation environment. Figure 9.2 shows a use case diagram of
the tool, which includes an overview of all functions. In the following sections,
we will explain these functions in more details.
9.3.1 User Accounts
The main page and the descriptions of input and output methods are visible for
every visitor to the tool’s webpage. However, the module creator, the model
10 Fitbit: https://www.fitbit.com/de/home
182 9 Simulation Tool for End-Users
Figure 9.2: Use case diagram for SimulaTE.
creator, and the simulation environment are only accessible for registered users.
Users can either sign up to create a new account for the tool or log in with an
existing social media account. On the personal profile, users can edit personal
information and change their password. Modules are personal building blocks
that are not shared between users; however users can decide for models whether
they should be private or public. Simulation runs are also stored on a personal
level and not shared with other users.
9.3 Functionality 183
9.3.2 Modules
Modules are small building blocks that have a name, a description, an abstract set
of inputs and outputs, and a function. Figure B.2 shows the module creator which
allows the user to create new modules from scratch. The inputs and outputs only
have a name and a description, but are not yet coupled to specific input and output
methods. The function of a module needs to be implemented in Python code.
The function is separated from the input and output methods to allow the same
function to be used in multiple models with different input and output methods,
for example classical methods or methods that allow the users to enter or show
uncertainty.
Figure B.3 shows the modules overview in a users’ profile. Modules are private
and always belong to one user account. A user can edit or delete modules in the
personal overview. If a module is edited, the user can decide to either override the
existing module or clone the module before editing it to avoid overriding modules
already used in models.
9.3.3 Models
A model in SimulaTE consists of one or several modules. Models can be created
in the model creator (see Figure B.4). The user can simply add or remove modules
from the model by clicking on them, and can then choose specific input and output
methods for the concrete model via drop down menus. The input and output
methods can additionally be adapted to personal needs by specifying parameters,
which are further explained in the input and output overview. Amodel additionally
has a name, a description, an optional picture and can have user-defined tags. In
contrast to modules, a model can be private or public. Logged in users can see
and execute public models.
Figure B.5 shows the model overview. In this view, users can edit or delete their
own models. Additionally, users can run their own or public models and access
all saved simulation runs previously executed for a model. A user can toggle
between the page showing all models, owned models, or all models that have
already been used to run simulations. To search for models, the user can enter
either free text or tags in the search bar on top of the page.
A click on the edit button of a model navigates the user to the model editor
(see Figure B.6). The editor shows either an abstract editor or a code editor to
184 9 Simulation Tool for End-Users
allow more experienced users to adapt the program code. It is possible to switch
between the two interfaces during the creation process of a model.
9.3.4 Input Methods
SimulaTE supports a whole range of input methods with different properties. The
input methods page provides an overview of all input methods that are supported
by SimulaTE. Figure B.7 shows a part of the input method page showing the
sliders that are described in Section 6.2. For each input method, we provide an
example where the users can try out the input method. Additionally, we list all
parameters of the input method and explain them in more details. Parameters
allow users to manipulate the input methods to match their needs, for example to
show default values, preselect valid ranges, or define descriptions of input fields.
SimulaTE offers input methods from the following four categories:
• Simple: Contains input methods that allow to enter a single value, such as
a number, a Boolean value, or text.
• Percentage: Contains all input methods that allow to enter a percentage
value.
• Range: Contains all input methods that allow a user to enter a number
range.
• Others: Contains all specific input methods that do not fit into the other
categories, e.g. sensor connections that allow to automatically transfer data
from sensors.
In all categories, SimulaTE offers both standard input methods and input meth-
ods that allow users to explicitly enter uncertainty. Models can support both
ways of input to actually allow users to decide whether they rather want to give
deterministic or uncertain input.
9.3.5 Output Methods
SimulaTE also supports a whole range of output methods. The output method
page provides an overview on all output methods that are supported by SimulaTE,
9.4 Implications 185
for example the visualization depicted in Section 7.3. Figure B.8 shows a part of
the output page with an area chart. As for the input methods, the page contains
one example of each output method and an overview of the parameters that can
be used to adapt the methods to the user’s personal needs. SimulaTE provides
different output methods such as bar charts, histograms, area charts, pie charts, as
well as simple text output.
9.3.6 Simulations
A simulation in SimulaTE is equivalent to the execution or run of a model. With a
click on the run button, the user enters a page that can execute the model. Before
execution, the user has to specify the necessary input parameters (see Figure B.9)
to get to the presentation of the output. After running a simulation, the user can
save the run to later have another look at the results.
The user can also navigate to all saved simulations for one specific model (see
Figure B.10). For each saved run, the user can have a look at the input values and
the output values.
9.4 Implications
We implemented a web-based simulation tool called SimulaTE, which supports
end-users in building and running calculations and simulations. We used a user-
centered process to implement and refine the tool incorporating simulation experts
and end-users with little programming knowledge.
Our development was based on the requirements identified in Chapter 4. We built
a general tool that allows users to run very distinct calculations and simulations.
We separated the model building step from the simulation step as public models
can be run by any user. Furthermore, we minimized necessary mathematics and
programming knowledge wherever possible. As functional requirements, we
included different sources of parameter input by allowing users to directly enter
data, connect devices, or use databases. We implemented an example model using
Fitbit data and a food database for a proof of concept. The tool further supports
different illustrative visualizations in addition to the output of plain numbers.
Our main focus when building the tool, however, was on the non-functional
requirements of flexibility and transparency. To increase flexibility, modules
186 9 Simulation Tool for End-Users
can be part of multiple models using different input and output methods. To
foster transparency, we especially offer input and output methods suitable for
communicating uncertainty.
The tool offers a starting point for future research in the direction of uncertainty
in interactive systems. Researchers, practitioners, and the general public can use
the tool to explore input and output methods communicating uncertainty.
Chapter10
Conclusion
In this thesis we have systematically explored the occurrence and handling of
uncertainty in the context of interactive systems. We showed that uncertainty
plays an important role in interactive systems and developed novel input and
output methods that support users in effectively dealing with uncertainty.
10.1 Research Contributions
Based on related work, we identified sources of uncertainty in interactive systems
by enhancing the General Interaction Framework. We then explored three im-
portant key aspects: the input, the output, and the interpretation of uncertainty.
On the input side, we contribute methods for the explicit and implicit input of
uncertainty. We explored the design space for standard input controls, specialized
slider controls, and tangible input controls. We additionally showed the feasi-
bility of using selected behavioral and physiological measurements to implicitly
capture uncertainty. On the output side, we found a lack of communication
of uncertainty in current mobile applications though users voiced a preference
for it. We contribute designs to improve the communication of uncertainty for
activity tracking data, and further contribute a classification for representations
based on the amount of uncertainty information communicated. We show that
representations showing detailed aggregated uncertainty information are most
promising to use when communicating uncertainty to the general public. Regard-
188 10 Conclusion
ing the interpretation, we found that engaging users to make predictions could
improve their reasoning about uncertainty. Additionally, we contribute design
recommendations for showing conflicting information in a weather context and
for general measurements. In the first case, we focused on users’ confidence and
in the second case on internal models that humans use to aggregate information.
We included the developed input and output methods in an end-user simulation
tool to simplify future research and usage of these methods.
In the following, we provide details on the findings for the research questions
identified at the beginning of this thesis.
10.1.1 Current Simulation Usage
In RQ1, we askedWhat can we learn from the current usage of simulations?
We conducted multiple small research probes to find answers for this question.
From analyzing expert simulation usage, we learned that simulation tools are very
specialized tools mainly used for one application area. Additionally, different
tools are used in different steps of the simulation process which might even be
carried out by different experts. To make simulations usable for the general public,
a more general approach has to be followed as less mathematics and programming
knowledge can be expected.
From non-expert usage, we learned that plenty of use cases exist for simulation
usage in everyday life, e.g. related to health or finances. We identified a set of
four main functional requirements for an end-user simulation tool: (1) support of
different sources of parameter input, (2) illustrative visualizations, (3) support of
short- and long-term predictive simulations, and (4) context-awareness. We addi-
tionally identified two non-functional requirements: flexibility and transparency.
To reach flexibility and transparency, uncertainty has to be taken into account at
all levels of the simulation process.
10.1.2 Sources of Uncertainty in Interactive Systems
In this section we focus on RQ2: What are the sources of uncertainty in
interactive systems?
Based on related work, we identified 13 distinct sources of uncertainty in in-
teractive systems, and integrated them into the General Interaction Framework.
10.1 Research Contributions 189
Sources of uncertainty were identified for the user, the input, the system, the
output, and all connections between them. Some sources, however, might apply
to multiple components of the General Interaction Framework. Sources of un-
certainty related to the system are not very relevant for the HCI community as
they are part of specific sciences, and sources of uncertainty related to the user
are part of research in Psychology; however sources of uncertainty connected to
the articulation, input, output, and observation are of specific interest for the HCI
community.
By developing novel input and output methods for interactive systems dealing with
uncertain data, uncertainty can either be reduced or quantified to appropriately
take it into account in all stages of system usage. New input techniques allowing
one to enter uncertainty can help to overcome uncertainty hidden from a system
due to a lack of user knowledge or imprecise measurements. The system can
take these sources into account as soon as a user communicates this uncertainty
to the system. Evaluations of new controls can help to minimize the sources of
uncertainty related to limited understanding of the users. Novel input and output
methods are also opportunities to provide more degrees of freedom and accuracy
than current input controls to further reduce uncertainty. Evaluations of new
output controls can help prevent future misjudgments.
We conclude that uncertainty plays an important role in interactive systems,
although it has often been neglected in past research. More research in the area
of uncertainty can help designers and researchers to better design for this aspect.
10.1.3 Input Methods
In the following we discuss RQ3: What input controls are suitable for un-
certain input? To answer this we conducted several research probes to explore
different design spaces related to input methods.
We considered multiple areas for the input of uncertain data: standard input
controls, specialized slider controls, tangible input controls, and behavioral and
physiological measurements. We showed that explicit and implicit methods are
feasible and suitable for quantifying uncertainty in user input.
For standard input controls, we suggest using an additional number field or slider
to allow users to specify a probability percentage. This is a small change compared
to current interfaces which allows users to indicate their uncertainty when using a
system. For a solution that adds further transparency and more degrees of freedom,
190 10 Conclusion
we propose specialized slider controls; the probability distribution sliders. One
advantage of these is that the slider controls allow for matching the degrees
of freedom to the context, the task and the statistical knowledge of the user.
The attached visualizations provide additional transparency on how the system
interprets the input data. We further identified tangible shape-changing devices as
a promising research area for input with uncertainty. The shape-changing aspect
of an interface can be leveraged to offer a smooth change between input with and
without uncertainty. This strengthens the connection between the single-value
input and the additional uncertainty value. Based on our evaluations, we propose
to use the SplitSlider design for the input of uncertain data.
For the implicit input of uncertainty, we recommend a combination of eye tracking
and key logging. However, more advanced physiological measurements might in
the future lead to better results.
10.1.4 Output Methods
We further contribute findings for RQ4: What visualizations are suitable for
uncertain output? To answer this question we conducted multiple research
probes to analyze the current state and compare different representations for
uncertain data.
We first analyzed how current mobile applications communicate uncertain data
and found that most applications do not communicate uncertainty at all. However,
users voiced a preference for uncertainty communication.
We then created three designs for communicating uncertainty in activity tracking
data. We propose to communicate a range instead of a fixed value and add grey
bars or other visual elements to bar charts that indicate the uncertainty. Our
studies showed that although users prefer to have more information, it still has
to be quickly graspable. Thus, adding too much complexity by communicating
uncertainty has to be avoided to minimize cognitive load.
We further found that visualizations showing more uncertainty information are not
necessarily more suitable than visualizations with less uncertainty information.
Familiarity, visual appeal, and the ease to understand a visualization are important
influencing factors. We contribute a classification of visualization by the amount
of uncertainty information that they include. The classification can support
researchers in selecting representations to support decision-making. We found
that representations with aggregated uncertainty information gave participants
10.2 Concluding Remarks 191
a misleading feeling of control. We therefore do not recommend to use such
representations, but rather focus on visualizations that include detailed aggregated
uncertainty information such as a histogram or dotplot. In our study, the histogram
proved to be most suitable for supporting users in decision-making.
10.1.5 Interpretation
In this section we focus on our research questionRQ5: How do people interpret
uncertain data? We conducted three research probes with different main foci to
better understand how people interpret uncertain data.
We found that user-made predictions can improve users’ understanding of their
personal behavior. Participants in our study started to discover behavior patterns
and reason about their daily activities to be able to improve their predictions.
They also had a strong will to improve their predictions and tried to reach their
predictions if possible.
For the interpretation of conflicting data, we found that a computationally gener-
ated average will not increase confidence in uncertain data. Participants rather
preferred a range view or details view as they wanted to make their own judge-
ments. To design for comparison, different types of users and scenarios have to
be taken into account. Additionally, transparency of the sources and the gathered
information establishes trust. We recommend to support easy comparison with a
range aggregation and a direct comparison to allow users to switch between the
two modes based on the context and their preference.
We further showed that uncertainty visualization improves reasoning about con-
flicting data. Point estimates are suitable for presenting non-conflicting data or
data with equal reliability as humans tend to select the average. If this is not
the case, uncertainty information can help users select a weighted average closer
to a statistically optimal value. We suggest a dotplot for supporting users in
aggregating conflicting information.
10.2 Concluding Remarks
Humans are confronted with uncertainty whenever they make a decision and more
uncertainty is also introduced in the digital world. Interactive systems include
uncertain information originating from machine learning or predictive algorithms.
192 10 Conclusion
As past studies in Psychology revealed the positive effects of communicating un-
certainty, such as increased trust and more optimal decision-making, it should be
the aim of HCI researchers and designers to adequately communicate uncertainty
to their users.
This thesis has provided a systematic exploration of uncertainty in interactive
systems. Based on the identification of sources of uncertainty, we implemented
and evaluated novel input and output methods for communicating uncertainty.
We are convinced that this is an important step to bring uncertainty into people’s
minds and raise awareness for the importance of uncertainty communication in
the HCI community. The interest in the workshop “Designing for Uncertainty in
HCI”, which we organized at CHI’17, shows that the topic has recently gained
attention in the HCI research community, which was not the case when we started
our exploration at the end of 2013.
We hope that our work and our developed tool can support and inspire practitioners
and other researchers to include uncertainty quantification and communication in
their design and development process. We outline potential follow-up experiments
and future work in the last chapter of this thesis.
Chapter11
Future Work
This thesis provides a systematic exploration of uncertainty in interactive systems
to set a common ground and starting point for future research about uncertainty
in HCI. During the course of this thesis, additional research questions appeared
that are beyond the scope of this thesis. This chapter summarizes interesting
follow-up research questions for future work.
11.1 Sources of Uncertainty
The presented classification of sources of uncertainty in interactive systems is a
first step to raise researchers’ and developers’ awareness of these sources. As a
next step, it would be helpful to further extend the classification to make it more
concrete.
The classification could be extended by adding potential types of uncertainty
associated with the sources and more concrete suggestions and guidelines for
researchers and developers on how to handle uncertainty. This thesis mainly
focused on suggesting new ways to increase the degrees of freedom in the in-
put and output methods. We additionally offered suggestions on how to avoid
misjudgments. Interdisciplinary work with psychologists, simulation experts,
or domain experts that carry out uncertainty quantification could help to further
understand sources of uncertainty beyond was is of primary relevance to HCI.
194 11 Future Work
It would additionally be helpful to collect best practices and concrete application
scenarios for interactive systems dealing with uncertainty. As the topic will
increasingly gain more attention in the HCI community, concrete application
scenarios with associated sources and explanations on how they handle uncertainty
could help researchers and developers to better understand which sources apply
for their concrete application scenario. Several methods such as literature reviews,
interviews with researchers and developers, and developing concrete best practices
together with these could be methods to apply for future research.
11.2 Input Methods
We explored different design spaces of input methods for entering uncertain data,
especially standard input controls, sliders, tangible interfaces, and physiological
measurements. However, other input methods remain to be explored. We suggest
to further explore different input methods such as gesture input, voice input, or
brain computer interfaces to understand whether users can be enabled to enter
uncertainty with such interfaces.
As we mainly tested our functional prototypes in lab studies, it would be in-
teresting to include the input methods in real-world systems. Specific research
prototypes could actually collect and measure the data to compare how well users
can estimate the uncertainty. Based on this data, models for uncertainty in user
input could be derived.
We also only tested very specific application scenarios with our input methods. All
experiments could be repeated within different application scenarios to understand
whether the findings can be easily transferred to other scenarios.
11.3 Output Methods
For the output, this thesis mostly focused on the graphical representation of
uncertainty. Further experiments could compare graphical representations to
purely linguistic and numerical presentation for different application scenarios in
interactive systems.
The same that applies to the input methods also applies to the output methods
for communicating uncertainty; as we mainly conducted lab studies, it would be
11.4 Interpretation 195
interesting to include the output methods in real-world systems to understand
whether the findings transfer to different application areas. In addition to real-
world systems, further games with different context could also help to better
understand how much findings depend on the presented scenarios.
11.4 Interpretation
Our exploration on user-made predictions could easily be continued. This could
include conducting follow-up experiments on the degree to which predictions
influence humans’ dealing with uncertainty and whether predictions could influ-
ence behavior change. Long-term studies would be needed to further explore
this.
The psychological aspect of aggregating conflicting information could easily be
extended. The work should be repeated with different contexts. Furthermore, it
would be interesting to understand whether naming specific sensors and framing
the question differently would influence users. To better understand the influence
of biases, users could even be handed real sensors to carry for several weeks to
collect and aggregate measurements. Further research could also construct mathe-
matical models for how humans aggregate data based on different representation
alternatives.
Another interesting research topic could be to understand how humans aggregate
conflicting news or other textual information to offer support strategies other than
numerical data. Humans might apply very different internal models for other data
types.
Another question is on how much training influences the interpretation of statisti-
cal data. Statistics education could make a difference on how humans interpret
uncertain data. Specific training methods might be suitable to remove biases and
help humans to make more optimal decisions under uncertainty.
11.5 End-User Simulation
The simulation tool SimulaTE offers a starting point for researchers to use input
and output methods for communicating uncertain data. The tool can be easily
extended to support additional functionality. It could for example be extended to
196 11 Future Work
not only offer the possibility to create models and run simulations, but also allow
researchers to analyze how users run simulations with public models. The tool
could additionally be connected to more external data sources and data bases to
support different kinds of simulations.
11.6 Concluding Remarks
With this thesis, we set a starting point for the research on uncertainty in interactive
systems. Although we have identified many open questions for future research, the
thesis still contributes a number of important findings to the research community.
We identified uncertainty in interactive systems as an important research topic
by applying insights gained from research of other fields to the HCI context,
and used these in the identification of sources of uncertainty. We furthermore
developed suitable input and output methods, which helped to identify guidelines
and further ideas. We built a system that includes this input and output methods
to support future research and generate further insights.
V
BIBLIOGRAPHY

BIBLIOGRAPHY
Jeroen C. J. H. Aerts, Keith C. Clarke, and Alex D. Keuper. Testing popular
visualization techniques for representing model uncertainty. Cartography
and Geographic Information Science, 30(3):249–261, 2003. doi: 10.1559/
152304003100011180.
American Meteorological Society. Enhancing weather information with
probability forecasts. Information Statement of the American Me-
teorological Society, https://www.ametsoc.org/ams/index.cfm/
about-ams/ams-statements/statements-of-the-ams-in-force/
enhancing-weather-information-with-probability-forecasts/,
May 2008.
Toni Amstad. Wie verständlich sind unsere Zeitungen? PhD thesis, University of
Zurich, 1978.
Anthony D. Andre and Henry A. Cutler. Displaying uncertainty in advanced
navigation systems. Proceedings of the Human Factors and Ergonomics Society
Annual Meeting, 42(1):31–35, 1998. doi: 10.1177/154193129804200108.
Stavros Antifakos, Adrian Schwaninger, and Bernt Schiele. Evaluating the
effects of displaying uncertainty in context-aware applications. In UbiComp
2004: Ubiquitous Computing, Lecture Notes in Computer Science, pages
54–69. Springer, Berlin/Heidelberg, 2004. ISBN 978-3-540-30119-6. doi:
10.1007/978-3-540-30119-6_4.
Aaron Bangor, Philip T. Kortum, and James T. Miller. An empirical evaluation
of the System Usability Scale. International Journal of Human-Computer
Interaction, 24(6):574–594, 2008. doi: 10.1080/10447310802205776.
Douglas Bates, Martin Mächler, Ben Bolker, and Steve Walker. Fitting linear
mixed-effects models using lme4. Journal of Statistical Software, 67(1):1–48,
October 2015. ISSN 1548-7660. doi: 10.18637/jss.v067.i01.
200 BIBLIOGRAPHY
Antonia Bauer and Ansbert Kneip. Der große Wissenstest für Kinder - Was
weißt du über die Welt? Kiepenheuer & Witsch, Cologne, 2016. ISBN
978-3-462-31545-5.
Sarah Belia, Fiona Fidler, Jennifer Williams, and Geoff Cumming. Researchers
misunderstand confidence intervals and standard error bars. Psychological
methods, 10(4):389–396, 2005. doi: 10.1037/1082-989X.10.4.389.
Ann M. Bisantz, Stephanie Schinzing Marsiglio, and Jessica Munch. Displaying
uncertainty: Investigating the effects of display format and specificity. Human
Factors, 47(4):777–796, 2005. doi: 10.1518/001872005775570916.
Ann M. Bisantz, Dapeng Cao, Michael Jenkins, Priyadarshini R. Pennathur,
Michael Farry, Emilie Roth, Scott S. Potter, and Jonathan Pfautz. Comparing
uncertainty visualizations for a dynamic decision-making task. Journal of
Cognitive Engineering and Decision Making, 5(3):277–293, 2011. doi: 10.
1177/1555343411415793.
Matthias Böhmer, Brent Hecht, Johannes Schöning, Antonio Krüger, and Gernot
Bauer. Falling asleep with Angry Birds, Facebook and Kindle: A large scale
study on mobile application usage. In Proceedings of the 13th International
Conference on Human Computer Interaction with Mobile Devices and Services,
MobileHCI ’11, pages 47–56. ACM, New York, 2011. ISBN 978-1-4503-0541-
9. doi: 10.1145/2037373.2037383.
Georges-Pierre Bonneau, Hans-Christian Hege, Chris R. Johnson, Manuel M.
Oliveira, Kristin Potter, Penny Rheingans, and Thomas Schultz. Overview and
state-of-the-art of uncertainty visualization. In Scientific Visualization: Uncer-
tainty, Multifield, Biomedical, and Scalable Visualization, pages 3–27. Springer,
London, 2014. ISBN 978-1-4471-6497-5. doi: 10.1007/978-1-4471-6497-5_1.
Nadia Boukhelifa and David John Duke. Uncertainty visualization: Why might
it fail? In CHI ’09 Extended Abstracts on Human Factors in Computing
Systems, CHI EA ’09, pages 4051–4056. ACM, New York, 2009. ISBN
978-1-60558-247-4. doi: 10.1145/1520340.1520616.
Ken Brodlie, Rodolfo Allendes Osorio, and Adriano Lopes. A review of uncer-
tainty in data visualization. In Expanding the Frontiers of Visual Analytics and
Visualization, pages 81–109. Springer, London, 2012. ISBN 978-1-4471-2804-
5. doi: 10.1007/978-1-4471-2804-5_6.
David V. Budescu. Confidence in aggregation of opinions from multiple sources.
In Information Sampling and Adaptive Cognition, pages 327–352. Cambridge
University Press, New York, 2006. ISBN 0-521-53933-1.
BIBLIOGRAPHY 201
David V. Budescu, Stephen Broomell, and Han-Hui Por. Improving com-
munication of uncertainty in the reports of the intergovernmental panel
on climate change. Psychological Science, 20(3):299–308, 2009. doi:
10.1111/j.1467-9280.2009.02284.x.
Stuart K. Card, Jock D. Mackinlay, and George G. Robertson. A morphological
analysis of the design space of input devices. ACM Transactions on Information
Systems, 9(2):99–122, April 1991. ISSN 1046-8188. doi: 10.1145/123078.
128726.
Meredith A. Case, Holland A. Burwick, Kevin G. Volpp, and Mitesh S. Patel.
Accuracy of smartphone applications and wearable devices for tracking physi-
cal activity data. The Journal of the American Medical Association, 313(6):
625–626, February 2015. doi: 10.1001/jama.2014.17841.
Eun K. Choe, Nicole B. Lee, Bongshin Lee, Wanda Pratt, and Julie A. Kientz.
Understanding quantified-selfers’ practices in collecting and exploring per-
sonal data. In Proceedings of the SIGCHI Conference on Human Factors in
Computing Systems, CHI ’14, pages 1143–1152. ACM, New York, 2014. ISBN
978-1-4503-2473-1. doi: 10.1145/2556288.2557372.
Edward T. Cokely, Mirta Galesic, Eric Schulz, Saima Ghazal, and Rocio Garcia-
Retamero. Measuring risk literacy: The Berlin numeracy test. Judgment
and Decision Making, 7(1):25–47, January 2012. ISSN 1930-2975. URL
http://journal.sjdm.org/11/11808/jdm11808.pdf.
Leana Copeland and Tom Gedeon. The effect of subject familiarity on com-
prehension and eye movements during reading. In Proceedings of the 25th
Australian Computer-Human Interaction Conference: Augmentation, Applica-
tion, Innovation, Collaboration, OzCHI ’13, pages 285–288. ACM, New York,
2013. ISBN 978-1-4503-2525-7. doi: 10.1145/2541016.2541082.
Michael Correll and Michael Gleicher. Error bars considered harmful: Exploring
alternate encodings for mean and error. IEEE Transactions on Visualization and
Computer Graphics, 20(12):2142–2151, December 2014. ISSN 1077-2626.
doi: 10.1109/TVCG.2014.2346298.
Helen Couclelis. The certainty of uncertainty: GIS and the limits of geographic
knowledge. Transactions in GIS, 7(2):165–175, 2003. ISSN 1467-9671. doi:
10.1111/1467-9671.00138.
Matthew A. Cronin, Cleotilde Gonzalez, and John D. Sterman. Why don’t
well-educated adults understand accumulation? A challenge to researchers, ed-
202 BIBLIOGRAPHY
ucators, and citizens. Organizational Behavior and Human Decision Processes,
108(1):116–130, 2009. ISSN 0749-5978. doi: 10.1016/j.obhdp.2008.03.003.
Scott E. Crouter, Patrick L. Schneider, Murat Karabulut, and David R. Bassett.
Validity of 10 electronic pedometers for measuring steps, distance, and energy
cost. Medicine & Science in Sports & Exercise, 35(8):1455–1460, August
2003. ISSN 0195-9131. doi: 10.1249/01.mss.0000078932.61440.a2.
Allen Cypher and David C. Smith. KidSim: End user programming of simulations.
In Proceedings of the SIGCHI Conference on Human Factors in Computing
Systems, CHI ’95, pages 27–34. ACM Press/Addison-Wesley Publishing Co.,
New York, 1995. ISBN 0-201-84705-1. doi: 10.1145/223904.223908.
Alan Dix. Human-Computer Interaction. In Encyclopedia of Database Systems,
pages 1327–1331. Springer US, Boston MA, 2009. ISBN 978-0-387-39940-9.
doi: 10.1007/978-0-387-39940-9_192.
Sidney D’Mello, Andrew Olney, Claire Williams, and Patrick Hays. Gaze tutor:
A gaze-reactive intelligent tutoring system. International Journal of Human-
Computer Studies, 70(5):377–398, May 2012. ISSN 1071-5819. doi: 10.1016/
j.ijhcs.2012.01.004.
Charles R. Ehlschlaeger, Ashton M. Shortridge, and Michael F. Goodchild. Visu-
alizing spatial data uncertainty using animation. Computers & Geosciences, 23
(4):387–395, 1997. ISSN 0098-3004. doi: 10.1016/S0098-3004(97)00005-8.
Stephen G. Eick. Data visualization sliders. In Proceedings of the 7th Annual
ACM Symposium on User Interface Software and Technology, UIST ’94, pages
119–120. ACM, New York, 1994. ISBN 0-89791-657-3. doi: 10.1145/192426.
192472.
Marc O. Ernst and Martin S. Banks. Humans integrate visual and haptic infor-
mation in a statistically optimal fashion. Nature, 415(6870):429–433, January
2002. ISSN 0028-0836. doi: 10.1038/415429a.
Jay Fenton and Kent Beck. Playground: An object-oriented simulation system
with agent rules for children of all ages. ACM SIGPLAN Notices, 24(10):
123–137, September 1989. ISSN 0362-1340. doi: 10.1145/74878.74891.
Nivan Ferreira, Danyel Fisher, and Arnd C. Konig. Sample-oriented task-driven
visualizations: Allowing users to make better, more confident decisions. In
Proceedings of the SIGCHI Conference on Human Factors in Computing
Systems, CHI ’14, pages 571–580, 2014. ISBN 978-1-4503-2473-1. doi:
10.1145/2556288.2557131.
BIBLIOGRAPHY 203
Kraig Finstad. The usability metric for user experience. Interacting with Comput-
ers, 22(5):323–327, September 2010. ISSN 0953-5438. doi: 10.1016/j.intcom.
2010.04.004.
George W. Fitzmaurice. Graspable user interfaces. PhD thesis, University of
Toronto, 1997.
Rudolph Flesch. A new readability yardstick. Journal of Applied Psychology, 32
(3):221–233, June 1948. doi: 10.1037/h0057532.
Jacqueline Frick and Christoph Hegg. Can end-users’ flood management decision
making be improved by information about forecast uncertainty? Atmospheric
Research, 100(2):296–303, 2011. ISSN 0169-8095. doi: 10.1016/j.atmosres.
2010.12.006.
Nahum Gershon. Visualization of an imperfect world. IEEE Computer Graphics
and Applications, 18(4):43–45, July 1998. ISSN 0272-1716. doi: 10.1109/38.
689662.
Gerd Gigerenzer and Ulrich Hoffrage. How to improve bayesian reasoning
without instruction: Frequency formats. Psychological Review, 102(4):684–
704, 1995. ISSN 1939-1471. doi: 10.1037/0033-295X.102.4.684.
Gerd Gigerenzer, Ralph Hertwig, Eva Van Den Broek, Barbara Fasolo, and
Konstantinos V. Katsikopoulos. "A 30% chance of rain tomorrow": How does
the public understand probabilistic weather forecasts? Risk Analysis, 25(3):623–
629, June 2005. ISSN 1539-6924. doi: 10.1111/j.1539-6924.2005.00608.x.
Tilmann Gneiting and Adrian E. Raftery. Weather forecasting with ensemble
methods. Science, 310(5746):248–249, October 2005. ISSN 0036-8075. doi:
10.1126/science.1115255.
Miriam Greis. Entwicklung von Simulationswerkzeugen für Laien – Heraus-
forderungen und Ziele. In Digitale Welten: Neue Ansätze in der Wirtschafts-
und Sozialkybernetik, pages 91–105. Duncker & Humblot GmbH, Berlin, 2014.
Miriam Greis, Thorsten Ohler, Niels Henze, and Albrecht Schmidt. Investigating
representation alternatives for communicating uncertainty to non-experts. In
Human-Computer Interaction – INTERACT 2015, Lecture Notes in Computer
Science, pages 256–263. Springer International Publishing, Cham, 2015. ISBN
978-3-319-22723-8. doi: 10.1007/978-3-319-22723-8_21.
Miriam Greis, Passant El.Agroudy, Hendrik Schuff, Tonja Machulla, and Albrecht
Schmidt. Decision-making under uncertainty: How the amount of presented
204 BIBLIOGRAPHY
uncertainty influences user behavior. In Proceedings of the 9th Nordic Confer-
ence on Human-Computer Interaction, NordiCHI ’16, pages 52:1–52:4. ACM,
New York, 2016. ISBN 978-1-4503-4763-1. doi: 10.1145/2971485.2971535.
Miriam Greis, Emre Avci, Albrecht Schmidt, and Tonja Machulla. Increasing
users’ confidence in uncertain data by aggregating data from multiple sources.
In Proceedings of the 2017 CHI Conference on Human Factors in Computing
Systems, CHI ’17, pages 828–840. ACM, New York, 2017a. ISBN 978-1-4503-
4655-9. doi: 10.1145/3025453.3025998.
Miriam Greis, Tilman Dingler, Albrecht Schmidt, and Chris Schmandt. Leverag-
ing user-made predictions to help understand personal behavior patterns. In Pro-
ceedings of the 19th International Conference on Human-Computer Interaction
with Mobile Devices and Services, MobileHCI ’17, pages 104:1–104:8. ACM,
New York, 2017b. ISBN 978-1-4503-5075-4. doi: 10.1145/3098279.3122147.
Miriam Greis, Jessica Hullman, Michael Correll, Matthew Kay, and Orit Shaer.
Designing for uncertainty in hci: When does uncertainty help? In Proceedings
of the 2017 CHI Conference Extended Abstracts on Human Factors in Com-
puting Systems, CHI EA ’17, pages 593–600. ACM, New York, 2017c. ISBN
978-1-4503-4656-6. doi: 10.1145/3027063.3027091.
Miriam Greis, Hendrik Schuff, Marius Kleiner, Niels Henze, and Albrecht
Schmidt. Input controls for entering uncertain data: Probability distribu-
tion sliders. Proceedings of the ACM on Human-Computer Interaction, 1(1):
3:1–3:17, June 2017d. ISSN 2573-0142. doi: 10.1145/3095805.
Theresia Gschwandtnei, Markus Bögl, Paolo Federico, and Silvia Miksch. Visual
encodings of temporal uncertainty: A comparative user study. IEEE Transac-
tions on Visualization and Computer Graphics, 22(1):539–548, January 2016.
ISSN 1077-2626. doi: 10.1109/TVCG.2015.2467752.
Fangfang Guo, Yu Li, Mohan S. Kankanhalli, and Michael S. Brown. An evalua-
tion of wearable activity monitoring devices. In Proceedings of the 1st ACM
International Workshop on Personal Data Meets Distributed Multimedia, PDM
’13, pages 31–34. ACM, New York, 2013. ISBN 978-1-4503-2397-0. doi:
10.1145/2509352.2512882.
Tomislav Hengl and Norair Toomanian. Maps are not what they seem: Repre-
senting uncertainty in soil-property maps. In Proceedings of Accuracy 2006,
pages 805–813. Institute Geogràfico Português, Lisbon, 2006. ISBN 972-
8867-271. URL http://www.spatial-accuracy.org/system/files/
Hengl2006accuracy.pdf.
BIBLIOGRAPHY 205
Jürgen Hotz. Duden - Testen Sie Ihre Allgemeinbildung. Dudenverlag, Mannheim,
2013. ISBN 978-3-411-90914-8.
Jürgen Hotz. Duden - Testen Sie Ihre Allgemeinbildung 2. Dudenverlag, Berlin,
2014. ISBN 978-3-411-90913-1.
Aulikki Hyrskykari, Päivi Majaranta, and Kari-Jouko Räihä. Proactive response to
eye movements. In Human-Computer Interaction – INTERACT 2003, INTER-
ACT ’03, pages 129–136. IOS press, Amsterdam, 2003. ISBN 1-58603-363-8.
Harald Ibrekk and M. Granger Morgan. Graphical communication of uncertain
quantities to nontechnical people. Risk Analysis, 7(4):519–529, 1987. ISSN
1539-6924. doi: 10.1111/j.1539-6924.1987.tb00488.x.
Christopher H. Jackson. Displaying uncertainty with shading. The American
Statistician, 62(4):340–347, 2008. doi: 10.1198/000313008X370843.
Susan Joslyn and Sonia Savelli. Communicating forecast uncertainty: Public
perception of weather forecast uncertainty. Meteorological Applications, 17
(2):180–195, June 2010. ISSN 1469-8080. doi: 10.1002/met.190.
Susan L. Joslyn and Jared E. LeClerc. Uncertainty forecasts improve weather-
related decisions and attenuate the effects of forecast error. Journal of Exper-
imental Psychology: Applied, 18(1):126–140, 2012. ISSN 1939-2192. doi:
10.1037/a0025185.
Susan L. Joslyn and Rebecca M. Nichols. Probability or frequency? Expressing
forecast uncertainty in public weather forecasts. Meteorological Applications,
16(3):309–314, September 2009. ISSN 1469-8080. doi: 10.1002/met.121.
Susan L. Joslyn, Limor Nadav-Greenberg, Meng U. Taing, and Rebecca M.
Nichols. The effects of wording on the understanding and use of uncertainty
information in a threshold forecasting decision. Applied Cognitive Psychology,
23(1):55–72, January 2009. ISSN 1099-0720. doi: 10.1002/acp.1449.
Malte F. Jung, David Sirkin, Turgut M. Gür, and Martin Steinert. Displayed
uncertainty improves driving experience and behavior: The case of range
anxiety in an electric car. In Proceedings of the 33rd Annual ACM Conference
on Human Factors in Computing Systems, CHI ’15, pages 2201–2210. ACM,
New York, 2015. ISBN 978-1-4503-3145-6. doi: 10.1145/2702123.2702479.
Jakob Karolus, Paweł W. Woz´niak, Lewis L. Chuang, and Albrecht Schmidt. Ro-
bust gaze features for enabling language proficiency awareness. In Proceedings
of the 2017 CHI Conference on Human Factors in Computing Systems, CHI
206 BIBLIOGRAPHY
’17, pages 2998–3010. ACM, New York, 2017. ISBN 978-1-4503-4655-9. doi:
10.1145/3025453.3025601.
Matthew Kay, Dan Morris, mc schraefel, and Julie A. Kientz. There’s no such
thing as gaining a pound: Reconsidering the bathroom scale user interface. In
Proceedings of the 2013 ACM International Joint Conference on Pervasive and
Ubiquitous Computing, UbiComp ’13, pages 401–410. ACM, New York, 2013.
ISBN 978-1-4503-1770-2. doi: 10.1145/2493432.2493456.
Matthew Kay, Shwetak N. Patel, and Julie A. Kientz. How good is 85%?:
A survey tool to connect classifier evaluation to acceptability of accuracy.
In Proceedings of the 33rd Annual ACM Conference on Human Factors in
Computing Systems, CHI ’15, pages 347–356. ACM, New York, 2015. ISBN
978-1-4503-3145-6. doi: 10.1145/2702123.2702603.
Matthew Kay, Tara Kola, Jessica R. Hullman, and Sean A. Munson. When (ish)
is my bus?: User-centered visualizations of uncertainty in everyday, mobile
predictive systems. In Proceedings of the 2016 CHI Conference on Human
Factors in Computing Systems, CHI ’16, pages 5092–5103. ACM, New York,
2016. ISBN 978-1-4503-3362-7. doi: 10.1145/2858036.2858558.
Sherman Kent. Words of estimative probability. Journal of the American Intelli-
gence Professional, 8(4):49–65, 1964.
Anass Lasram, Sylvain Lefebvre, and Cyrille Damez. Scented sliders for proce-
dural textures. In Eurographics 2012 - Short Papers, Eurographics ’12. The
Eurographics Association, Geneve, 2012. doi: 10.2312/conf/EG2012/short/
045-048.
Averill M. Law. Simulation Modeling and Analysis. McGraw-Hill Education,
New York, 5th edition, 2015. ISBN 987-1-259-25438-3.
Dan Ledger and Daniel McCaffrey. Inside Wearables: How the science of
human behavior change offers the secret to long-term engagement. Blog:
https://blog.endeavour.partners/inside-wearable-how-the-
science-of-human-behavior-change-offers-the-secret-to-
long-term-engagement-a15b3c7d4cf3, January 2014.
Wiliam Leiss. Three phases in the evolution of risk communication practice. The
ANNALS of the American Academy of Political and Social Science, 545(1):
85–94, May 1996. ISSN 0002-7162. doi: 10.1177/0002716296545001009.
BIBLIOGRAPHY 207
James R. Lewis. IBM computer usability satisfaction questionnaires: Psycho-
metric evaluation and instructions for use. International Journal of Human-
Computer Interaction, 7(1):57–78, 1995. doi: 10.1080/10447319509526110.
Brian Y. Lim and Anind K. Dey. Assessing demand for intelligibility in context-
aware applications. In Proceedings of the 11th International Conference on
Ubiquitous Computing, UbiComp ’09, pages 195–204. ACM, New York, 2009.
ISBN 978-1-60558-431-7. doi: 10.1145/1620545.1620576.
Brian Y. Lim and Anind K. Dey. Investigating intelligibility for uncertain context-
aware applications. In Proceedings of the 13th International Conference on
Ubiquitous Computing, UbiComp ’11, pages 415–424. ACM, New York, 2011.
ISBN 978-1-4503-0630-0. doi: 10.1145/2030112.2030168.
Isaac M. Lipkus. Numeric, verbal, and visual formats of conveying health risks:
Suggested best practices and future recommendations. Medical Decision
Making, 27(5):696–713, 2007. doi: 10.1177/0272989X07307271.
Isaac M. Lipkus, Greg Samsa, and Barbara K. Rimer. General performance on a
numeracy scale among highly educated samples. Medical Decision Making,
21(1):37–44, February 2001. doi: 10.1177/0272989X0102100105.
Alan M. MacEachren. Visualizing uncertain information. Cartographic Perspec-
tive, 13(13):10–19, November 1992. doi: 10.14714/CP13.1000.
Alan M. MacEachren, Anthony Robinson, Susan Hopper, Steven Gardner, Robert
Murray, Mark Gahegan, and Elisabeth Hetzler. Visualizing geospatial infor-
mation uncertainty: What we know and what we need to know. Cartogra-
phy and Geographic Information Science, 32(3):139–160, July 2005. doi:
10.1559/1523040054738936.
Alan M. MacEachren, Robert E. Roth, James O’Brien, Bonan Li, Derek Swingley,
and Mark Gahegan. Visual semiotics & uncertainty visualization: An empirical
study. IEEE Transactions on Visualization and Computer Graphics, 18(12):
2496–2505, December 2012. ISSN 1077-2626. doi: 10.1109/TVCG.2012.279.
Adrian Madsen, Adam Larson, Lester Loschky, and N. Sanjay Rebello. Using
scanmatch scores to understand differences in eye movements between correct
and incorrect solvers on physics problems. In Proceedings of the Symposium
on Eye Tracking Research and Applications, ETRA ’12, pages 193–196. ACM,
New York, 2012. ISBN 978-1-4503-1221-9. doi: 10.1145/2168556.2168591.
208 BIBLIOGRAPHY
Dawn M. Marsh-Richard, Erin S. Hatzis, Charles W. Mathias, Nicholas Ven-
ditti, and Donald M. Dougherty. Adaptive Visual Analog Scales (AVAS): A
modifiable software program for the creation, administration, and scoring of
visual analog scales. Behavior Research Methods, 41(1):99–106, February
2009. ISSN 1554-3528. doi: 10.3758/BRM.41.1.99.
Justin Matejka, Michael Glueck, Tovi Grossman, and George Fitzmaurice. The
effect of visual appearance on the performance of continuous sliders and visual
analogue scales. In Proceedings of the 2016 CHI Conference on Human
Factors in Computing Systems, CHI ’16, pages 5421–5432. ACM, New York,
2016. ISBN 978-1-4503-3362-7. doi: 10.1145/2858036.2858063.
Daniel J. McDuff, Javier Hernandez, Sarah Gontarek, and Rosalind W. Picard.
COGCAM: Contact-free measurement of cognitive stress during computer
tasks with a digital camera. In Proceedings of the 2016 CHI Conference on
Human Factors in Computing Systems, CHI ’16, pages 4000–4004. ACM, New
York, 2016. ISBN 978-1-4503-3362-7. doi: 10.1145/2858036.2858247.
Betty Hearn Morrow, Julie L. Demuth, and Jeffrey K. Lazo. Communicating
weather forecast uncertainty: An exploratory study with broadcast meteorolo-
gists. Final report of focus groups conducted at the 36th AMS conference on
broadcast meteorology, SocResearch Miami and National Center for Atmo-
spheric Research, 2008.
Rebecca E. Morss, Julie L. Demuth, and Jeffrey K. Lazo. Communicating
uncertainty in weather forecasts: A survey of the U.S. public. Weather and
Forecasting, 23(5):974–991, October 2008. ISSN 0882-8156. doi: 10.1175/
2008WAF2007088.1.
Rebecca E. Morss, Jeffrey K. Lazo, and Julie L. Demuth. Examining the use
of weather forecasts in decision scenarios: Results from a US survey with
implications for uncertainty communication. Meteorological Applications, 17
(2):149–162, June 2010. ISSN 1469-8080. doi: 10.1002/met.196.
Allan H. Murphy and Robert L. Winkler. Forecasters and probability forecasts:
The responses to a questionnaire. Bulletin of the American Meteorological So-
ciety, 52(3):158–166, March 1971. doi: 10.1175/1520-0477(1971)052<0158:
FAPFTR>2.0.CO;2.
Allan H. Murphy, Sarah Lichtenstein, Baruch Fischhoff, and Robert L. Win-
kler. Misinterpretations of precipitation probability forecasts. Bulletin
of the American Meteorological Society, 61(7):695–701, July 1980. doi:
10.1175/1520-0477(1980)061<0695:MOPPF>2.0.CO;2.
BIBLIOGRAPHY 209
Limor Nadav-Greenberg and Susan L. Joslyn. Uncertainty forecasts improve deci-
sion making among nonexperts. Journal of Cognitive Engineering and Decision
Making, 3(3):209–227, September 2009. doi: 10.1518/155534309X474460.
National Research Council. Completing the Forecast: Characterizing and Com-
municating Uncertainty for Better Decisions Using Weather and Climate Fore-
casts. The National Academies Press, Washington D.C., 2006. ISBN 978-0-
309-10255-1. doi: 10.17226/11699.
John P. Norton, James D. Brown, and Jaroslav Mysiak. To what extent, and
how, might uncertainty be defined? Comments engendered by "Defining un-
certainty: A conceptual basis for uncertainty management in model-based
decision support". Integrated Assessment Journal, 6(1):83–88, 2006. ISSN
1389-5176. URL http://journals.sfu.ca/int_assess/index.php/
iaj/article/view/9/195.
Christopher Olston and Jock D. Mackinlay. Visualizing data with bounded uncer-
tainty. In Proceedings of the IEEE Symposium on Information Visualization,
INFOVIS ’02, pages 37–40. IEEE Computer Society, Washington D.C., 2002.
ISBN 0-7695-1751-X. doi: 10.1109/INFVIS.2002.1173145.
Masakazu Osada, Holmes Liao, and Ben Shneiderman. Alpha slider: Searching
textual lists with sliders. Department of Computer Science Technical Report,
University of Maryland, April 1993.
Antti Oulasvirta, Tye Rattenbury, Lingyi Ma, and Eeva Raita. Habits make
smartphone use more pervasive. Personal and Ubiquitous Computing, 16(1):
105–114, January 2012. ISSN 1617-4917. doi: 10.1007/s00779-011-0412-2.
Alex T. Pang, Craig M. Wittenbrink, and Suresh K. Lodha. Approaches to
uncertainty visualization. The Visual Computer, 13(8):370–390, November
1997. ISSN 1432-2315. doi: 10.1007/s003710050111.
Florian Pappenberger, Elisabeth Stephens, Jutta Thielen, Peter Salamon, David
Demeritt, Schalk Jan van Andel, Fredrik Wetterhall, and Lorenzo Alfieri.
Visualizing probabilistic flood forecast information: Expert preferences and
perceptions of best practice in uncertainty communication. Hydrological
Processes, 27(1):132–146, January 2013. ISSN 1099-1085. doi: 10.1002/hyp.
9253.
Heike Pfersdorff and Iris Glahn. Duden - Testen Sie Ihr Wissen! Das Allgemein-
bildungsquiz. Dudenverlag, Berlin, 2015. ISBN 978-3-411-91104-2.
210 BIBLIOGRAPHY
Kristin Potter, Joe Kniss, Richard Riesenfeld, and Chris R. Johnson. Visualizing
summary statistics and uncertainty. Computer Graphics Forum, 29(3):823–832,
June 2010. ISSN 1467-8659. doi: 10.1111/j.1467-8659.2009.01677.x.
Kristin Potter, Paul Rosen, and Chris R. Johnson. From quantification to
visualization: A taxonomy of uncertainty visualization approaches. In
Uncertainty Quantification in Scientific Computing, volume 377 of IFIP
Advances in Information and Communication Technology, pages 226–249.
Springer, Berlin/Heidelberg, 2012. ISBN 978-3-642-32677-6. doi: 10.1007/
978-3-642-32677-6_15.
Steven F. Railsback and Volker Grimm. Agent-based and individual-based
modeling: A practical introduction. Princeton University, Princeton NJ, 2012.
ISBN 978-0-691-13673-8.
Maria H. Ramos, Schalk J. Van Andel, and Florian Pappenberger. Do probabilistic
forecasts lead to better decisions? Hydrology and Earth System Sciences, 17
(6):2219–2232, June 2013. doi: 10.5194/hess-17-2219-2013.
Majken K. Rasmussen, Esben W. Pedersen, Marianne G. Petersen, and Kasper
Hornbæk. Shape-changing interfaces: A review of the design space and open
research questions. In Proceedings of the SIGCHI Conference on Human
Factors in Computing Systems, CHI ’12, pages 735–744. ACM, New York,
2012. ISBN 978-1-4503-1015-4. doi: 10.1145/2207676.2207781.
Mitchel Resnick. StarLogo: An environment for decentralized modeling and
decentralized thinking. In Conference Companion on Human Factors in Com-
puting Systems, CHI ’96, pages 11–12. ACM, New York, 1996. ISBN 0-89791-
832-0. doi: 10.1145/257089.257095.
Gordan Ristovski, Tobias Preusser, Horst K. Hahn, and Lars Linsen. Uncertainty
in medical visualization: Towards a taxonomy. Computers & Graphics, 39:
60–73, April 2014. ISSN 0097-8493. doi: 10.1016/j.cag.2013.10.015.
Maria Riveiro. Evaluation of uncertainty visualization techniques for information
fusion. In Proceedings of the 10th International Conference on Information
Fusion, Fusion ’07, pages 1–8. IEEE Computer Society, Washington D.C.,
2007. doi: 10.1109/ICIF.2007.4408049.
Anne Roudaut, Abhijit Karnik, Markus Löchtefeld, and Sriram Subramanian.
Morphees: Toward high "shape resolution" in self-actuated flexible mobile
devices. In Proceedings of the SIGCHI Conference on Human Factors in
Computing Systems, CHI ’13, pages 593–602. ACM, New York, 2013. ISBN
BIBLIOGRAPHY 211
978-1-4503-1899-0. doi: 10.1145/2470654.2470738. URL http://doi.acm.
org/10.1145/2470654.2470738.
Mark S. Roulston and Todd R. Kaplan. A laboratory-based study of understanding
of uncertainty in 5-day site-specific temperature forecasts. Meteorological
Applications, 16(2):237–244, June 2009. ISSN 1469-8080. doi: 10.1002/met.
113.
Mark S. Roulston, Gary E. Bolton, Andrew N. Kleit, and Addison L. Sears-
Collins. A laboratory study of the benefits of including uncertainty information
in weather forecasts. Weather and Forecasting, 21(1):116–122, February 2006.
ISSN 0882-8156. doi: 10.1175/WAF887.1.
Enrico Rukzio, John Hamard, Chie Noda, and Alexander De Luca. Visualization
of uncertainty in context aware mobile applications. In Proceedings of the 8th
Conference on Human-computer Interaction with Mobile Devices and Services,
MobileHCI ’06, pages 247–250. ACM, New York, 2006. ISBN 1-59593-390-5.
doi: 10.1145/1152215.1152267.
Dominik Sacha, Hansi Senaratne, Bum C. Kwon, Geoffrey Ellis, and Daniel A.
Keim. The role of uncertainty, awareness, and trust in visual analytics. IEEE
Transactions on Visualization and Computer Graphics, 22(1):240–249, January
2016. ISSN 1077-2626. doi: 10.1109/TVCG.2015.2467591.
Jibonananda Sanyal, Song Zhang, Gargi Bhattacharya, Phil Amburn, and Robert J.
Moorhead. A user study to compare four uncertainty visualization methods
for 1D and 2D datasets. IEEE Transactions on Visualization and Computer
Graphics, 15(6):1209–1218, November 2009. ISSN 1077-2626. doi: 10.1109/
TVCG.2009.114.
Jeffer E. Sasaki, Amanda Hickey, Marianna Mavilia, Jacquelynne Tedesco,
Dinesh John, Sarah K. Keadle, and Patty S. Freedson. Validation of the
fitbit wireless activity tracker for prediction of energy expenditure. Jour-
nal of Physical Activity and Health, 12(2):149–154, February 2015. doi:
10.1123/jpah.2012-0495.
Sonia Savelli and Susan Joslyn. The advantages of predictive interval forecasts
for non-expert users and the impact of visualizations. Applied Cognitive
Psychology, 27(4):527–541, July/August 2013. ISSN 1099-0720. doi: 10.
1002/acp.2932.
Julia Schwarz, Scott Hudson, Jennifer Mankoff, and Andrew D. Wilson. A
framework for robust and flexible handling of inputs with uncertainty. In
212 BIBLIOGRAPHY
Proceedings of the 23nd Annual ACM Symposium on User Interface Software
and Technology, UIST ’10, pages 47–56. ACM, New York, 2010. ISBN
978-1-4503-0271-5. doi: 10.1145/1866029.1866039.
Wilfried Seibicke. Duden - Wie schreibt man gutes Deutsch? Eine Stilfibel.
Dudenverlag, Mannheim, 1969. ISBN 3-411-01137-8.
Orit Shaer, Oded Nov, Johanna Okerlund, Martina Balestra, Elizabeth Stowell,
Lauren Westendorf, Christina Pollalis, Jasmine Davis, Liliana Westort, and
Madeleine Ball. GenomiX: A novel interaction tool for self-exploration of
personal genomic data. In Proceedings of the 2016 CHI Conference on Human
Factors in Computing Systems, CHI ’16, pages 661–672. ACM, New York,
2016. ISBN 978-1-4503-3362-7. doi: 10.1145/2858036.2858397.
Orit Shaer, Oded Nov, Lauren Westendorf, and Madeleine Ball. Communicating
personal genomic information to non-experts: A new frontier for Human-
Computer Interaction. Foundations and Trendsr in Human-Computer Inter-
action, 11(1):1–62, May 2017. ISSN 1551-3955. doi: 10.1561/1100000067.
Robert E. Shannon. Introduction to the art and science of simulation. In Proceed-
ings of the 30th Conference on Winter Simulation, WSC ’98, pages 7–14. IEEE
Computer Society Press, Los Alamitos CA, 1998. ISBN 0-7803-5134-7.
Meredith Skeels, Bongshin Lee, Greg Smith, and George G. Robertson. Revealing
uncertainty for information visualization. In Proceedings of the Working
Conference on Advanced Visual Interfaces, AVI ’08. ACM, New York, 2008.
ISBN 978-1-60558-141-5. doi: 10.1145/1385569.1385637.
David C. Smith, Allen Cypher, and Jim Spohrer. KidSim: Programming agents
without a programming language. Communications of the ACM, 37(7):54–67,
July 1994. ISSN 0001-0782. doi: 10.1145/176789.176795.
Randall B. Smith. The alternate reality kit: An animated environment for creating
interactive simulations. In Proceedings of the 1986 IEEE Computer Society
Workshop on Visual Languages, pages 99–106. IEEE Computer Society Press,
Silver Spring MD, 1986.
David Spiegelhalter, Mike Pearson, and Ian Short. Visualizing uncertainty about
the future. Science, 333(6048):1393–1400, September 2011. ISSN 0036-8075.
doi: 10.1126/science.1191181.
Linda B. Sweeney and John D. Sterman. Bathtub dynamics: Initial results of a
systems thinking inventory. System Dynamics Review, 16(4):249–286, Winter
2000. ISSN 1099-1727. doi: 10.1002/sdr.198.
BIBLIOGRAPHY 213
Susanne Tak and Alexander Toet. Color and uncertainty: It is not always
black and white. In EuroVis - Short Papers, EuroVis ’14, pages 55–59. The
Eurographics Association, Geneve, 2014. ISBN 978-3-905674-69-9. doi:
10.2312/eurovisshort.20141157.
Barry N. Taylor and Chris E. Kuyatt. Guidelines for evaluating and expressing
the uncertainty of NIST measurement results. NIST Technical Note 1297,
National Institute of Standards and Technology, 1994.
John Taylor. Introduction to Error Analysis, the Study of Uncertainties in Physical
Measurements, 2nd Edition. University Science Books, New York, 1997. ISBN
0-935702-42-3.
Karl Halvor Teigen and Magne Jørgensen. When 90% confidence intervals
are 50% certain: On the credibility of credible intervals. Applied Cognitive
Psychology, 19(4):455–475, May 2005. ISSN 1099-0720. doi: 10.1002/acp.
1085.
Jakob Tholander and Stina Nylander. Snot, sweat, pain, mud, and snow: Perfor-
mance and experience in the use of sports watches. In Proceedings of the 33rd
Annual ACM Conference on Human Factors in Computing Systems, CHI ’15,
pages 2913–2922. ACM, New York, 2015. ISBN 978-1-4503-3145-6. doi:
10.1145/2702123.2702482.
Judi Thomson, Elizabeth Hetzler, Alan MacEachren, Mark Gahegan, and
Misha Pavel. A typology for visualizing uncertainty. In Proceedings of the
IS&TSPIE’s Symposium on Electronic Imaging, volume 5669 of IS&TSPIE
’05, pages 146–157, 2005. doi: 10.1117/12.587254.
Seth Tisue and Uri Wilensky. Netlogo: A simple environment for modeling
complexity. Technical report, New England Complex Systems Institute, 2004.
Amos Tversky and Daniel Kahneman. Judgment under uncertainty: Heuristics
and biases. In Utility, Probability, and Human Decision Making, volume 11 of
Theory and Decision Library, pages 141–162. Springer Netherlands, Dordrecht,
1975. ISBN 978-94-010-1834-0. doi: 10.1007/978-94-010-1834-0_8.
Tina C. Walber, Ansgar Scherp, and Steffen Staab. Smart photo selection: In-
terpret gaze as personal interest. In Proceedings of the SIGCHI Conference
on Human Factors in Computing Systems, CHI ’14, pages 2065–2074. ACM,
New York, 2014. ISBN 978-1-4503-2473-1. doi: 10.1145/2556288.2557025.
214 BIBLIOGRAPHY
Warren E. Walker, Poul Harremoës, Jan Rotmans, Jeroen P. van der Sluijs, Mar-
jolein B. A. van Asselt, Peter Janssen, and Martin P. Krayer von Krauss.
Defining uncertainty: A conceptual basis for uncertainty management in
model-based decision support. Integrated Assessment, 4(1):5–17, 2003. doi:
10.1076/iaij.4.1.5.16466.
Gudrun Wallentin and Adrijana Car. A framework for uncertainty assessment in
simulation models. International Journal of Geographical Information Science,
27(2):408–422, 2013. doi: 10.1080/13658816.2012.715163.
Thomas S. Wallsten, David V. Budescu, Amnon Rapoport, Rami Zwick, and
Barbara Forsyth. Measuring the vague meanings of probability terms. Journal
of Experimental Psychology: General, 115(4):348–365, December 1986. ISSN
1939-2222. doi: 10.1037/0096-3445.115.4.348.
Wesley Willett, Jeffrey Heer, and Maneesh Agrawala. Scented widgets: Im-
proving navigation cues with embedded visualizations. IEEE Transactions
on Visualization and Computer Graphics, 13(6):1129–1136, November 2007.
ISSN 1077-2626. doi: 10.1109/TVCG.2007.70589.
Craig M. Wittenbrink, Elijah Saxon, Jeff J. Furman, Alex T. Pang, and Suresh K.
Lodha. Glyphs for visualizing uncertainty in environmental vector fields. In
Proceedings of the IS&TSPIE’s Symposium on Electronic Imaging, IS&TSPIE
’96, pages 266–279, 1996. doi: 10.1117/12.205940.
Jacob O. Wobbrock, Leah Findlater, Darren Gergle, and James J. Higgins. The
aligned rank transform for nonparametric factorial analyses using only anova
procedures. In Proceedings of the SIGCHI Conference on Human Factors in
Computing Systems, CHI ’11, pages 143–146. ACM, New York, 2011. ISBN
978-1-4503-0228-9. doi: 10.1145/1978942.1978963.
VI
APPENDICES

AppendixA
Study Interface for Enhancing
Standard Input Controls
In this appendix, we include screenshots of our web-based study interface for
enhancing standard input controls described in Section 6.1.
218A Study Interface for Enhancing Standard Input Controls
(a) Study interface 1: Radio buttons as main input method and a numeric slider to report uncertainty as
percentage.
(b) Study interface 2: Drop down menu as main input method and a number field to report uncertainty
as percentage.
Figure A.1: Web-based study interface with the two different main input
methods and uncertainty input methods for mutiple-choice questions.
219
(a) Study Interface 3: A number field as main input method and a numeric slider to report uncertainty as
range.
(b) Study Interface 4: A slider with two thumbs as main input method and a pair of number fields to
report uncertainty as range.
Figure A.2: Web-based study interface with the two different main input
methods and uncertainty input methods for numerical questions.
220A Study Interface for Enhancing Standard Input Controls
AppendixB
Screenshots of SimulaTE
In this appendix, we include screenshots of our web-based simulation tool Simu-
laTE described in Chapter 9.
Figure B.1: Introduction page of SimulaTE with an explanation of terms and
functionality of the tool.
222 B Screenshots of SimulaTE
(a) First part of the module creation process entering a name, a description, inputs, and
outputs.
(b) Second part of the module creation process entering functions.
Figure B.2: The module creator of SimulaTE allows the users to create new
modules. A module consists of a name, a description, inputs, outputs, and
functions.
223
Figure B.3: The module overview of SimulaTE allows the users to edit or
delete their modules.
Figure B.4: The model creator allows the users to create a model. The model
has a name, a description, tags and a picture. The users can select specific
input and output methods and a formula from the modules which can be
selected by dragging them to the editor.
224 B Screenshots of SimulaTE
Figure B.5: The model overview provides users with the possible to search
for models and the possible to edit, delete, or run a model.
Figure B.6: The model editor allows users to adapt the inputs, outputs, and
functions of a model.
225
Figure B.7: The overview of the input methods shows users all available
input methods with associated parameters. Additionally, a classification of
the input methods and an example for valid parameters are shown.
Figure B.8: The overview of the output methods shows users all available
output methods with associated parameters, for example the area range chart.
Code and parameter examples are included.
226 B Screenshots of SimulaTE
Figure B.9: User interface for running a simulation in SimulaTE.
Figure B.10: Overview on simulation results in SimulaTE.

Miriam Greis
A Systematic Exploration of Uncertainty in Interactive Systems
Uncertainty is an inherent part of our everyday life. Humans have to deal with 
uncertainty every time they make a decision. The importance of uncertainty 
additionally increases in the digital world. Machine learning and predictive algorithms 
introduce statistical uncertainty to digital information. In addition, the rising number 
of sensors in our surroundings increases the amount of statistically uncertain data, 
as sensor data is prone to measurement errors. Hence, there is an emergent need 
for practitioners and researchers in Human-Computer Interaction to explore new 
concepts and develop interactive systems able to handle uncertainty. Such systems 
should not only support users in entering uncertainty in their input, but additionally 
present uncertainty in a comprehensible way. 
The main contribution of this thesis is the exploration of the role of uncertainty in 
interactive systems and how novel input and output methods can support researchers 
and designers to efficiently and clearly communicate uncertainty. By using empirical 
methods of Human-Computer Interaction and a systematic approach, we present 
novel input and output methods that support the comprehensive communication of 
uncertainty in interactive systems. We further integrate our results in a simulation 
tool for end-users. 
Based on related work, we create a systematic overview of sources of uncertainty in 
interactive systems to support the quantification of uncertainty and identify relevant 
research areas. The overview can help practitioners and researchers to identify 
uncertainty in interactive systems and either reduce or communicate it. We then 
introduce new concepts for the input of uncertain data. We enhance standard input 
controls, develop specific slider controls and tangible input controls, and collect 
physiological measurements. We also compare different representations for the 
output of uncertainty to make recommendations for their usage. Furthermore, we 
analyze how humans interpret uncertain data und make suggestions on how to 
avoid misinterpretation and statistically wrong judgements. We embed the insights 
gained from the results of this thesis in an end-user simulation tool to make it 
available for future research. The tool is intended to be a starting point for future 
research on uncertainty in interactive systems and foster communicating uncertainty 
and building trust in the system. Overall, our work shows that user interfaces can 
be enhanced to effectively support users with the input and output of statistically 
uncertain information.