Hand-and-Finger-Awareness for Mobile

Touch Interaction using Deep Learning

Von der Fakultät für Informatik, Elektrotechnik und Informationstechnik

und dem Stuttgart Research Centre for Simulation Technology

der Universität Stuttgart zur Erlangung der Würde eines Doktors der

Naturwissenschaften (Dr. rer. nat.) genehmigte Abhandlung

Vorgelegt von

Huy Viet Le

aus Nürtingen

Hauptberichter: Prof. Dr. Niels Henze
Mitberichter: Prof. Antti Oulasvirta
Mitprüfer: Jun.-Prof. Dr. Michael Sedlmair

Tag der mündlichen Prüfung: 28.03.2019

Institut für Visualisierung und Interaktive Systeme

der Universität Stuttgart

2019


2


Zusammenfassung

Mobilgeräte wie Smartphones und Tablets haben mittlerweile Desktop Compu-
ter für ein breites Spektrum an Aufgaben abgelöst. Nahezu jedes Smartphone
besitzt einen berührungsempfindlichen Bildschirm (Touchscreen), welches die
Ein- und Ausgabe zu einer Schnittstelle kombiniert und dadurch eine intuitive
Interaktion ermöglicht. Mit dem Erfolg sind Anwendungen, die zuvor nur für
Desktop Computer verfügbar waren, nun auch für Mobilgeräte verfügbar. Dieser
Wandel steigerte die Mobilität von Computern und erlaubt es Nutzern dadurch
Anwendungen auch unterwegs zu verwenden.

Trotz des Erfolgs von Touchscreens sind traditionelle Eingabegeräte, wie
Tastatur und Maus, aufgrund ihrer Eingabemöglichkeiten immer noch überlegen.
Eine Maus besitzt mehrere Tasten, mit denen verschiedene Funktionen an dersel-
ben Zeigerposition aktiviert werden können. Zudem besitzt eine Tastatur mehrere
Hilfstasten, mit denen die Funktionalität anderer Tasten vervielfacht werden. Im
Gegensatz dazu beschränken sich die Eingabemöglichkeiten von Touchscreens
auf zweidimensionale Koordinaten der Berührung. Dies bringt einige Herausfor-
derungen mit sich, die die Benutzerfreundlichkeit beeinträchtigen. Unter anderem
sind Möglichkeiten zur Umsetzung von Kurzbefehlen eingeschränkt, was Sheider-
mans goldene Regeln für das Interface Design widerspricht. Zudem wird meist
nur ein Finger für Eingabe verwendet, was die Interaktion verlangsamt. Weitere
Herausforderungen, wie das Fat-Finger Problem und die limitierte Erreichbarkeit

3


auf großen Geräten, tragen mit Unbequemlichkeiten bei. Neue berührungsba-
sierte Interaktionsmethoden werden benötigt, um die Eingabemöglichkeiten auf
Touchscreens zu erweitern und die Eingabe mit mehreren Fingern, wie es bei
traditionellen Eingabegeräten üblich ist, zu ermöglichen.

In dieser Arbeit wird untersucht, wie es einzelnen Fingern und Teile der
Hand ermöglicht werden kann, Eingaben auf einem mobilen Gerät zu tätigen und
zwischen deren Eingaben zu unterscheiden. Dieses Konzept wird als “Hand-und-
Finger-bewusste” Interaktion bezeichnet. Durch die Erkennung von Hand und
Finger können einzelnen Fingern und Teile der Hand verschiedene Funktionen
zugewiesen werden, was die Eingabemöglichkeit erweitert und viele Herausforde-
rungen der Touch Interaktion löst. Des Weiteren ermöglicht die Anwendung des
Konzepts der “Hand-und-Finger-bewussten” Interaktion auf die komplette Gerä-
teoberfläche nun auch die Verwendung der hinteren Finger zur Eingabe, die bisher
nur das Gerät hielten. Dies addressiert weitere Herausforderungen der Touch
Interaktion und bietet viele Möglichkeiten zur Realisierung von Kurzbefehlen.

Diese Dissertation enthält die Ergebnisse aus zwölf Studien, welche sich auf
die Design Aspekte, die technische Realisierbarkeit und die Benutzerfreundlich-
keit der “Hand-und-Finger-bewussten” Interaktion fokussieren. Im ersten Schritt
wird die Ergonomie und das Verhalten der Hand untersucht, um die Entwicklung
neuer Interaktionstechniken zu inspirieren. Anschließend wird erforscht, wie gut
einzelne Finger und Teile der Hand mit Hilfe von Deep Learning Techniken und
Rohdaten von kapazitiven Sensoren erkannt werden können. Dabei wird sowohl
ein einzelner kapazitiver Bildschirm, als auch ein Gerät, das rundum Berührungen
registriert, verwendet. Basierend darauf präsentieren wir vier Studien, die sich
damit befassen Kurzbefehle von Computer-Tastaturen auf mobile Geräte zu brin-
gen, um die Benutzerfreundlichkeit von Textverarbeitung auf Mobilgeräten zu
verbessern. Wir folgen dabei dem angepassten benutzerzentriertem Designprozess
für die Anwendung von Deep Learning.

Der Kernbeitrag dieser Dissertation erstreckt sich von tieferen Einsichten
zur Interaktion mit verschiedenen Fingern und Teilen der Hand, über einen tech-
nischen Beitrag zur Identifikation der Berührungsquelle mit Hilfe von Deep
Learning Techniken, bis hin zu Ansätzen zur Lösung der Herausforderungen
mobiler Berührungseingabe.

4


Abstract

Mobile devices such as smartphones and tablets have replaced desktop computers
for a wide range of everyday tasks. Virtually every smartphone incorporates a
touchscreen which enables an intuitive interaction through a combination of input
and output in a single interface. Due to the success of touch input, a wide range of
applications became available for mobile devices which were previously exclusive
to desktop computers. This transition increased the mobility of computing devices
and enables users to access important applications even while on the move.

Despite the success of touchscreens, traditional input devices such as keybo-
ard and mouse are still superior due to their rich input capabilities. For instance,
computer mice offer multiple buttons for different functions at the same cur-
sor position while hardware keyboards provide modifier keys which augment
the functionality of every other key. In contrast, touch input is limited to the
two-dimensional location of touches sensed on the display. The limited input
capabilities slow down the interaction and pose a number of challenges which
affect the usability. Among others, shortcuts can merely be provided which affects
experienced users and contradicts Shneiderman’s golden rules for interface design.
Moreover, the use of mostly one finger for input slows down the interaction while
further challenges such as the fat-finger problem and limited reachability add ad-
ditional inconveniences. Although the input capabilities are sufficient for simple
applications, more complex everyday tasks which require intensive input, such

5


as text editing, are still not widely adopted yet. Novel touch-based interaction
techniques are needed to extend the touch input capabilities and enable multiple
fingers and even parts of the hand to perform input similar to traditional input
devices.

This thesis examines how individual fingers and other parts of the hand can
be recognized and used for touch input. We refer to this concept as hand-and-
finger-awareness for mobile touch interaction. By identifying the source of input,
different functions and action modifiers can be assigned to individual fingers and
parts of the hand. We show that this concept increases the touch input capabilities
and solves a number of touch input challenges. In addition, by applying the
concept of hand-and-finger-awareness to input on the whole device surface,
previously unused fingers on the back are now able to perform input and augment
touches on the front side. This further addresses well-known challenges in touch
interaction and provides a wide range of possibilities to realize shortcuts.

We present twelve user studies which focus on the design aspects, technical
feasibility, and the usability of hand-and-finger-awareness for mobile touch
interaction. In a first step, we investigate the hand ergonomics and behavior
during smartphone use to inform the design of novel interaction techniques.
Afterward, we examine the feasibility of applying deep learning techniques to
identify individual fingers and other hand parts based on the raw data of a single
capacitive touchscreen as well as of a fully touch sensitive mobile device. Based
on these findings, we present a series of studies which focus on bringing shortcuts
from hardware keyboards to a fully touch sensitive device to improve mobile
text editing. Thereby, we follow a user-centered design process adapted for the
application of deep learning.

The contribution of this thesis ranges from insights on the use of different
fingers and parts of the hand for interaction, through technical contributions
for the identification of the touch source using deep learning, to solutions for
addressing limitations of mobile touch input.

6


Acknowledgements

Over the past three years, I had one of the best times of my life working together
with a number of amazing colleagues and friends who inspired me a lot. Without
their support, this work would never have been possible.

First and foremost, I would like to thank my supervisor Niels Henze who
inspired my work and always supported me in the best possible ways to achieve
my goals. Without his support, I would have never came this far. I further thank
my committee Antti Oulasvirta, Michael Sedlmair, and Stefan Wagner for the
great and inspiring discussions. Discussions with Syn Schmitt in the SimTech
milestone presentation, and a number of student peers and mentors in doctoral
consortia at international conferences further shaped my thesis. I would also like
to thank Albrecht Schmidt for all his great support which even goes beyond
research. Moreover, I thank Andreas Bulling for the opportunity to stay another
five months to finalize my thesis.

Before my time as a PhD student, I had the great honor to meet a number of
awesome people who introduced me into the world of Human-Computer Inte-
raction research. I thank Alireza Sahami Shirazi for his outstanding supervision
during my bachelor’s thesis. His inspiration and recommendations played a huge
role in getting me into HCI research. I further thank Tilman Dingler for his
exceptional support and organization which provided me with the opportunity to
write my master’s thesis at the Lancaster University. During my time in Lancas-

7


ter, I had a great and memorable time working with Corina Sas, Nigel Davies,
and Sarah Clinch. I further thank Mateusz Mikusz who helped me finding an
accommodation and ensured that everything was fine.

I had the great pleasure to work with amazingly helpful and skilled colleagues
who shaped my time as a PhD student. I thank my incredible office mates Domi-
nik Weber, Hyunyoung Kim, and Nitesh Goyal for all the inspiring discussions
and for bearing the time with me while I typed on my mechanical keyboard. I am
further thankful for all the collaborations which taught me how to write papers,
build prototypes, and supervise students. In particular, I thank Sven Mayer for
sharing his research experiences and for all the great work together which resulted
in a lot of publications. I further thank Patrick Bader for sharing his endless
knowledge on hardware prototyping and algorithms. I also thank Francisco
Kiss for helping me with his extensive knowledge in electrical engineering and
soldering skills. I am further thankful to Katrin Wolf for inspiring me a lot
with her experiences in mobile interaction, and Lewis Chuang for the valuable
collaboration.

A PhD is not only work but also a lot of fun. I thank Jakob Karolus and
Thomas Kosch for the great and adventurous road trips through the US. I further
thank the rest of the awesome hcilab group in Stuttgart who made every day a
really enjoyable day: Alexandra Voit, Bastian Pfleging, Céline Coutrix, Lars
Lischke, Mariam Hassib, Matthias Hoppe, Mauro Avila, Miriam Greis Nor-
man Pohl, Pascal Knierim, Passant El.Agroudy, Paweł W. Woźniak, Rufat
Rzayev, Romina Poguntke, Stefan Schneegaß, Thomas Kubitza, Tonja Ma-
chulla, Valentin Schwind, and Yomna Abdelrahman. A special thanks goes
to Anja Mebus, Eugenia Komnik and Murielle Naud-Barthelmeß for all their
support and the administrative work that keeps the lab running smoothly.

It was also a pleasure to work with awesome student assistants who supported
me in conducting studies, recruiting participants, and transcribing interviews.
This thesis would have not been possible without the support of Jamie Ullerich,
Jonas Vogelsang, Max Weiß, and Henrike Weingärtner - thank you!

Last but not least, I would like to thank my family for their unconditional
support - my father Hung Son Le and mother Thi Bich Lien Luu for raising me
to be the person I am today, for inspiring and making it possible for me to get

8


the education I wanted, and to making it possible for me to explore technology.
I thank my sister Bich Ngoc Le for being there for me and supporting me in all
possible ways. Further, I thank all my friends for their emotional support and
patience that they showed me on my way to the PhD.

Thank you!

9


10


Table of Contents

1 Introduction 15

1.1 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . 18

1.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

1.2.1 Limitations of the User-Centered Design Process . . . . . . . . . 19

1.2.2 Limitations of Common Deep Learning Processes . . . . . . . . 21

1.2.3 User-Centered Design Process for Deep Learning . . . . . . . . 22

1.3 Research Context . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

1.4 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2 Background and Related Work 29

2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.1.1 History and Development of Touch Interaction . . . . . . . . . . 30

2.1.2 Capacitive Touch Sensing . . . . . . . . . . . . . . . . . . . . 33

2.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

2.2.1 Hand Ergonomics for Mobile Touch Interaction . . . . . . . . . . 35

2.2.2 Novel Touch-Based Interaction Methods . . . . . . . . . . . . . 39

2.2.3 Interacting with Smartphones Beyond the Touchscreen . . . . . . 47

2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

11


3 Hand Ergonomics for Mobile Touch Interaction 53

3.1 Interaction Beyond the Touchscreen . . . . . . . . . . . . . . . . . . 54

3.1.1 Reachability of Input Controls . . . . . . . . . . . . . . . . . . 55

3.1.2 Unintended Inputs . . . . . . . . . . . . . . . . . . . . . . . . 56

3.2 Study I: Range and Comfortable Area of Fingers . . . . . . . . . . . . 57

3.2.1 Study Design . . . . . . . . . . . . . . . . . . . . . . . . . . 57

3.2.2 Apparatus . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

3.2.3 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

3.2.4 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

3.2.5 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . 61

3.2.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

3.2.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

3.3 Study II: Investigating Unintended Inputs . . . . . . . . . . . . . . . . 71

3.3.1 Study Design . . . . . . . . . . . . . . . . . . . . . . . . . . 71

3.3.2 Apparatus . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

3.3.3 Tasks and Procedure . . . . . . . . . . . . . . . . . . . . . . 74

3.3.4 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

3.3.5 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . 76

3.3.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

3.3.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

3.4 General Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 90

3.4.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

3.4.2 Design Implications . . . . . . . . . . . . . . . . . . . . . . . 91

4 Hand-and-Finger-Awareness on Mobile Touchscreens 93

4.1 Identifying the Source of Touch . . . . . . . . . . . . . . . . . . . . . 94

4.1.1 The Palm as an Additional Input Modality . . . . . . . . . . . . . 94

4.1.2 Investigating the Feasiblity of Finger Identification . . . . . . . . 99

4.2 Input Technique I: Palm as an Additional Input Modality (PalmTouch) . . 101

4.2.1 Data Collection Study . . . . . . . . . . . . . . . . . . . . . . 101

4.2.2 Model Development . . . . . . . . . . . . . . . . . . . . . . . 104

4.2.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

4.3 Input Technique II: Finger Identification . . . . . . . . . . . . . . . . . 117

4.3.1 Data Collection Study . . . . . . . . . . . . . . . . . . . . . . 117

12 Table of Contents


4.3.2 Model Development . . . . . . . . . . . . . . . . . . . . . . . 121

4.3.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

4.4 General Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 135

4.4.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

4.4.2 Lessons Learned . . . . . . . . . . . . . . . . . . . . . . . . 136

4.4.3 Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

5 Hand-and-Finger-Awareness on Full-Touch Mobile Devices 139

5.1 InfiniTouch: Finger-Aware Input on Full-Touch Smartphones . . . . . . 140

5.1.1 Full-Touch Smartphone Prototype . . . . . . . . . . . . . . . . 140

5.1.2 Ground Truth Data Collection . . . . . . . . . . . . . . . . . . 144

5.1.3 Finger Identification Model . . . . . . . . . . . . . . . . . . . . 147

5.1.4 Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

5.1.5 Mobile Implementation and Sample Applications . . . . . . . . . 152

5.1.6 Discussion and Limitations . . . . . . . . . . . . . . . . . . . . 156

5.2 Exploring Interaction Methods and Use Cases . . . . . . . . . . . . . 160

5.2.1 Interviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

5.2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

5.2.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

5.3 General Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 167

5.3.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

5.3.2 Lessons Learned . . . . . . . . . . . . . . . . . . . . . . . . 168

6 Improving Shortcuts for Text Editing 171

6.1 Text Editing on Mobile Devices . . . . . . . . . . . . . . . . . . . . . 172

6.1.1 Study Overview . . . . . . . . . . . . . . . . . . . . . . . . . 173

6.2 Study I: Shortcuts on Hardware Keyboards . . . . . . . . . . . . . . . 174

6.2.1 Apparatus . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174

6.2.2 Procedure and Participants . . . . . . . . . . . . . . . . . . . 174

6.2.3 Log Analysis: Shortcuts on Hardware Keyboards . . . . . . . . . 175

6.2.4 Interviews: Hardware and Touchscreen Keyboards . . . . . . . . 176

6.2.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180

6.3 Study II: Gesture Elicitation . . . . . . . . . . . . . . . . . . . . . . 181

6.3.1 Referents . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181

Table of Contents 13


6.3.2 Apparatus and Procedure . . . . . . . . . . . . . . . . . . . . 182

6.3.3 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . 183

6.3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183

6.3.5 Gesture Set for Shortcuts in Text-Heavy Activities . . . . . . . . 188

6.3.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189

6.4 Study III: Implementing the Gesture Set on a Full-Touch Smartphone . . 190

6.4.1 Apparatus . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190

6.4.2 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . 190

6.4.3 Procedure and Study Design . . . . . . . . . . . . . . . . . . . 191

6.4.4 Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191

6.4.5 Mobile Implementation . . . . . . . . . . . . . . . . . . . . . . 195

6.5 Study IV: Evaluation of Shortcut Gestures . . . . . . . . . . . . . . . 196

6.5.1 Study Procedure and Design . . . . . . . . . . . . . . . . . . . 196

6.5.2 Apparatus . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198

6.5.3 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . 199

6.5.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199

6.5.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207

6.6 General Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 209

6.6.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209

6.6.2 Lessons Learned . . . . . . . . . . . . . . . . . . . . . . . . 210

7 Conclusion and Future Work 211

7.1 Summary of Research Contributions . . . . . . . . . . . . . . . . . . 212

7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215

Bibliography 217

List of Acronyms 256

14 Table of Contents


1
Introduction

Over two billion people own a mobile device such as a smartphone or a ta-
blet [285]. With their mobility and increasing processing capability, mobile
devices replaced personal computers and laptops for the majority of everyday
computing tasks. Millions of downloads on mobile app stores show that applicati-
ons such as email clients, web browsers, calendars, and even editors for various
media have become viable alternatives to their desktop counterparts. While mo-
bile phones started with arrays of hardware buttons and a small display, recent
smartphones incorporate a touchscreen that combines input and output in a single
interface. This enables users to directly touch elements of the user interface (UI)
and interact with them intuitively similar to physical objects.

With touchscreens, smartphones can be designed as compact and self-contained
mobile devices which leverage the whole front side for input as well as output. As
a consequence, a wide range of applications previously designed for computers
with keyboard and mouse are now also offering touch-based UIs. This transition
increases the mobility of computing devices and enables users to use their device
even while on the move. However, keyboards and mice as input devices are still
superior to touch input since they provide more input capabilities. The difference
is noticeable especially for complex tasks which require high precision (e.g. pla-

15


cing the caret in a text) and repetitive actions for which shortcuts are commonly
used (e.g. copy and paste). Limited input capabilities slow down the interaction
and lead to a lack of shortcuts which are fundamental for experienced users as
described by Shneiderman’s golden rules for interface design [209].

In contrast to touchscreens, a computer mouse offers multiple buttons which
enable users to activate different functions at the same cursor position. Similarly,
hardware keyboards offer modifier keys (e.g., Ctrl, Alt, and Shift) which add
additional dimensions to every other key. Touchscreens, however, translate a touch
on the display into a two-dimensional coordinate which is mapped to the UI. While
direct manipulation is powerful, the input’s expressiveness is limited to single
coordinates despite the sheer amount of additional information that a smartphone
could provide about a touch. With 3D Touch1, Apple showed that touch input can
be purposefully extended by a pressure modality based on a proprietary technology
involving an additional sensing layer. While this is the prime commercial example,
the touch input vocabulary on commodity smartphones can also be extended
without additional sensors beyond the touchscreen. In particular, the raw data
of capacitive touchscreens was used for estimating the touch contact size [24],
shape [182], and the orientation of a finger on the display [156, 198, 265]. These
interaction techniques generally leverage properties beyond touch coordinates
to provide additional input dimensions. However, mapping functions to specific
finger postures increases the likelihood of unintended activations since a finger is
now controlling multiple modalities simultaneously.

One solution to lower the likelihood of unintended activations is to identify
the touching finger or part of the hand to avoid interference with the main finger
for interaction (e.g. the thumb). Previous work [38, 63, 82] identified parts
of the finger (e.g. knuckle) or individual fingers to use the touch source as an
additional input modality. However, the number of fingers that can touch the
display during the prevalent single-handed grip [109, 110, 176, 178] is limited
while additional wearable sensors [74, 75, 152] are required for an accurate finger
identification. Differentiating between inputs of multiple fingers and hand parts
while enabling them to interact with the device would profoundly extend the
touch input capabilities. This would make smartphones more suitable for tasks

1https://developer.apple.com/ios/3d-touch/

16 1 | Introduction

https://developer.apple.com/ios/3d-touch/


which require complex inputs and help to solve common touch input limitations
such as the fat-finger problem [16, 217], reachability issues [20, 133], and the
lack of shortcuts. Without requiring immobile and inconvenient wearable sensors,
or a second hand which is not always available, smartphones could become an
even more viable and mobile replacement for personal computers and laptops.

One step towards this vision was presented by previous work on Back-of-
Device (BoD) interaction (e.g. [16, 39, 46, 133, 197, 250, 269]). With the input
space extended to the rear, fingers that previously held the device are now able
to perform input. However, previous work treated the touch-sensitive rear as an
additional input space but not as an opportunity to enable individual fingers to
perform specific input. Generally, only grip patterns were considered [33, 35,
36], while touch-sensitive areas were limited so that only the index finger can
perform BoD input [10, 46, 133]. Consequently, the input space was extended but
individual fingers and hand parts are still not usable as different input modalities.

Touch inputs from individual hand parts and fingers need to be recognized and
differentiated to use them as unique input modalities. In particular, the raw data
of capacitive sensors (such as from recent touchscreens) contain enough signal
which could be used to infer the source of a touch. With deep learning, robust
and lightweight models could be developed which identify hand parts and fingers
on nowadays’ smartphones. This concept profoundly extends the mobile touch
input vocabulary and will be referred to as hand-and-finger-aware interaction.

Before this concept can be used on commodity smartphones, a wide range
of challenges need to be addressed. First, designing hand-and-finger-aware
interactions with a focus on usability requires an understanding of the behavior
and ergonomics of individual fingers while holding smartphones. There is no
previous work which analyzes the reachable areas for each finger, nor the areas
in which fingers typically move and reside. Second, the technical feasibility of
identifying individual hand parts and fingers needs to be investigated. There is
no system yet which identifies fingers and hand parts with accuracies usable for
realistic everyday scenarios based on the raw data of commodity capacitive touch
sensing technologies. Third, we also need to evaluate the concept of hand-and-
finger-awareness with potential users to gather feedback. This enables to improve
the concept to a level which is ready for the mass-market.

Table of Contents 17


1.1 Research Questions

In this thesis, we explore the concept of hand-and-finger-aware interaction for
mobile devices. To inform the design and development of hand-and-finger-aware
interaction methods, we present an exploration of six high-level research questions
(RQs). The RQs are presented in Table 1.1.

An important basis to design input on the whole device surface is the analysis
of finger movements which do not require a grip change. Since a grip change
leads to a loss of grip stability and could lead to dropping the device, we need
to understand the range which individual fingers can cover and the areas in
which they can comfortably move (RQ1). In addition to explicit movements,
we further need to understand micro-movements which fingers perform while
interacting with the device. An understanding is vital to minimize unintended
inputs generated by these movements (RQ2).

We use the raw data of capacitive sensors to identify hand parts and fingers
based on deep learning. Before this approach can be leveraged for hand-and-
finger-aware interaction, we need to investigate its feasibility and usability. We
investigate the identification of hand parts and fingers using the raw data of a
single capacitive touchscreen, i.e. on today’s commodity smartphones (RQ3). We
further examine the feasibility of identifying individual fingers on fully touch
sensitive smartphones (RQ4). This would enable the fingers on the rear to perform
input, while the grip can be reconstructed for further interaction techniques.

After understanding the ergonomics and behavior of all fingers while holding
and interacting with smartphones, we evaluate hand-and-finger-aware interaction
for common use cases. This helps to understand how this concept can be leveraged
to further improve mobile interaction. Since touch input on recent mobile devices
poses a number of limitations, we investigate how we could address them on a
fully touch sensitive smartphone. This includes an elicitation of the limitations
and potential solutions proposed by experienced interaction designers (RQ5).
Finally, we focus text editing as a specific use case which the interaction designers
identified as important but still inconvenient due to the limited input capabilities.
In particular, we investigate the design and implementation of shortcuts on fully
touch sensitive smartphones to improve text editing (RQ6).

18 1 | Introduction


Research Question No. Chapter

I. Hand Ergonomics for Mobile Touch Interaction

How can we design Back-of-Device input controls to consider
the reachability of fingers in a single-handed grip?

(RQ1) Chapter 3

How can we design Back-of-Device input controls to minimize
unintended inputs?

(RQ2) Chapter 3

II. Identifying Fingers and Hand Parts

How can we differentiate between individual fingers or hand
parts on a capacitive touchscreen?

(RQ3) Chapter 4

How can we estimate the position of individual fingers and
identify them on a fully touch sensitive smartphone?

(RQ4) Chapter 5

III. Improving Mobile Touch Interaction

Which typical touch input limitations could be solved with a
fully touch sensitive smartphone?

(RQ5) Chapter 5

How can we design and use shortcuts on a fully touch sensitive
smartphone to improve text editing?

(RQ6) Chapter 6

Table 1.1: Summary of research questions addressed in this thesis.

1.2 Methodology

Designing, developing, and evaluating novel interaction techniques is one of the
major topics in human-computer interaction (HCI). The goal of an interaction
technique is to provide users with a way to accomplish tasks based on a combina-
tion of hardware and software elements.

1.2.1 Limitations of the User-Centered Design Process

Previous work in HCI presented novel interaction techniques based on the user-
centered design (UCD) process [102] as shown in Figure 1.1. The UCD process
outlines four phases throughout an iterative design and development cycle to
develop interactive systems with a focus on usability. The process consists of
phases for understanding the context of use, specifying the user requirements,
and developing a solution (i.e., implementing a working prototype) which is
evaluated against the requirements. Each cycle represents an iteration towards a

1.2 | Methodology 19


solution which matches the users’ context and satisfies all of the relevant needs
(e.g., increasing the usability to a level which satisfies relevant users). The UCD
process focuses on the concept of the solution itself, assuming that specified user
requirements can be unambiguously translated into a working prototype. Indeed,
previous work commonly identified the need and requirements of an interaction
technique and prototyped them using hand-crafted algorithms which range from
simple value comparisons [152], thresholding [24, 74], and transfer functions [39]
through computer vision techniques [93, 96] to kinematic models [23, 202].

With the advent of deep learning, complex relationships and patterns (e.g.,
in sensor data) can be learned from large amounts of data. Due to the increased
availability of computing power and open-source frameworks (e.g., TensorFlow1,
Keras2, PyTorch3), deep learning became a powerful tool for HCI researchers
to develop solutions which are robust, lightweight enough to run on mobile
devices, and do not even require domain knowledge (e.g., for a particular sensor
and its noise). In addition, major parts of the prototypes can be reused even in
market-ready versions of the system by reusing the data for model development
or retraining the model for similar sensors. Prominent examples include object
recognition in image data which even outperform humans [87, 88, 218].

Despite the powerful modeling capabilities, deep learning produces black box
models which can hardly be understood by humans. Due to the lack of knowledge
about a deep learning model’s internal workings, the model needs to be trained,
tested, and validated with potential users within multiple iterations until it achieves
the desired result. In contrast, the UCD process describes the design of a solution
in a single step without involving potential users, an evaluation of its usability in
a subsequent step, and a full refinement in a further iteration. Due to the huge
effort required for developing a deep learning model (i.e. gathering a data set and
multiple iterations of model development), the UCD process needs to be refined
in order to incorporate iterative developments and tests of a model, as well as
evaluating the model’s usability within the whole interactive system. In particular,
the designing solution step needs to incorporate the modeling cycle of a deep
learning process and connect it to the usability aspects of the UCD.

1TensorFlow: https://www.tensorflow.org/
2Keras: https://keras.io/
3PyTorch: https://pytorch.org/

20 1 | Introduction

https://www.tensorflow.org/
https://keras.io/
https://pytorch.org/


1.2.2 Limitations of Common Deep Learning Processes

A typical process for developing and evaluating deep learning models consists of
four phases: gathering a representative data set (e.g., through a data collection
study or using already existing ones), preparing the data (e.g., exploring, cle-
aning, and normalizing), training and testing the model, as well as validating
its generalizability on previously unseen data. Thereby, training and testing are
often repeated in multiple iterations to find the most suitable hyperparameters
that lead to the lowest model error on the test set based on trial-and-error and grid
search [101] approaches. A final model validation with previously unseen data
then assesses whether the chosen hyperparameters were overfitting to the test set.

For this process, the deep learning community often use a training-test-
validation split [42] (i.e., training and test set for model development, and the
validation set for a one-time validation of the model) to develop and validate
a model’s performance. However, software metrics alone (i.e., accuracies and
error rates to describe how well the model generalizes to unseen data) do not
describe the usability of a system which is the main focus of the UCD process.
Instead of software metrics, factors such as the effect of inference errors on the
usability (i.e. how well is the perceived usability for a given use case and how
impactful are errors?), the model stability (i.e. how noisy are the estimations
over time for none to small variations?), and the usefulness of the investigated
system should be considered. As systems are used by a wide range of users and
in different scenarios, the validation also needs to assess whether the model can
generalize beyond the (specific and/or abstract) tasks used in a data collection
study. Moreover, while previous work considered accuracies above 80% to be
sufficient [113], sufficiency depends on the use case (i.e. whether the action’s
consequence is recoverable and how much the consequence affects the user)
which can only be evaluated in studies through user feedback.

In summary, a typical process for deep learning describes the iterative nature
of developing and evaluating black box models, but does not consider the usability
of the model and thus of the final system. To apply deep learning techniques in
HCI, we need to refine and combine the UCD process with typical deep learning
processes to consider both the iterative development and evaluation of models, as
well as their usability within the final system.

1.2 | Methodology 21


Solu�on

meets

requirements

Understand
and specify
context of 

use

Specify
user

require-
ments

Design
solutions

Evaluate
against
require-
ments

Design Improvement Iteration

Figure 1.1: The user-centered design process as described in ISO 9241-210 [102].

1.2.3 User-Centered Design Process for Deep Learning

We present the user-centered design process for deep learning (UCDDL) which
combines the UCD process with steps required for deep learning and is depicted
in Figure 1.2. The UCDDL consists of five phases, whereas the first two phases
are identical to the traditional UCD process and focus on understanding users as
well as specifying requirements. The next three phases focus on developing a
prototype based on deep learning and evaluating the system based on the factors
described above. In the following, we describe the UCDDL which we apply
throughout this thesis.

1. Understand and specify the context of use. This phase is about identifying
users who will use the system, their tasks, and under which conditions they will
use it (e.g., technical and ergonomic constraints). This step could consist of user
studies to understand the context of use, or based on findings from previous work.

2. Specify user requirements. Based on the identified context, application
scenarios and prototype requirements need to be specified. Based on these
requirements, the solution will be developed and evaluated against.

3. Collect data based on user requirements. Training a deep learning model
requires a representative and large enough data set as the ground truth. Gathering
this data set in the context of a user study involves the design and development
of an apparatus which runs mockup tasks to cover all expected interactions.

22 1 | Introduction


Model 
Development

Solu�on 

meets

requirements

Understand 
and specify 
context of 

use

Specify 
user 

require-
ments

Collect data
based on 
 require-
ments

Model 
Develop-

ment

Design Improvement Iteration

Model Improvement
Iteration

Design solution using deep learning

Model 
Validation & 

Design 
Evaluation

Figure 1.2: Adapted user-centered design process for deep learning in the context of

interactive systems in HCI.

Instructing potential users to perform certain tasks even enables the apparatus to
automatically label each collected sample. This assumes that the experimenter
is carefully observing whether participants actually perform the requested input
correctly (e.g., when instructing participants to touch with a certain finger, it can
be assumed that the captured data samples represent the instructed finger). The
user study needs to be conducted with a representative set of potential users which
cover all relevant factors to collect a sufficient amount of data for model training.

The data set is the foundation of the developed system and needs to be refined
(i.e., extended with more variance by adding users and tasks to cover the specified
requirements) in case the final system does not generalize to new users and tasks
which were specified in the requirements. In this case, another data collection
study needs to be conducted whereas the resulting new data set needs to be
combined with the already existing data set. In addition, the data collected in the
evaluation phase (see Phase 5) could also be used to extend the existing data set.

4. Model development. Based on the data set, this phase applies deep learning
to develop a model which is used by the system. Prior to the actual model training,
the data set often needs to be cleaned (e.g., removing empty or potentially erro-
neous samples for which the label correctness cannot be ensured) or augmented
in case producing the desired amount of data is not feasible (e.g. adding altered
samples such as by rotating the input or adding artificial sensor noise). Further,
we first explore the data set with techniques such as visual inspection, descriptive

1.2 | Methodology 23


and inferential statistics (e.g. finding correlations), as well as applying basic ma-
chine learning models such as linear regression and SVMs using simple feature
extraction. This step provides an overview of the data set and helps choosing the
optimal model and hyperparameters in later steps. In case only very few samples
could be collected (e.g., due to a high effort for collecting or labeling), these basic
models represent a viable solution.

After data preprocessing and exploration, the data set needs to be split into a
training and test set to avoid the same samples being “seen” during training and
testing. Since the same user could generate highly similar data, the data set should
further be split by participants (instead of by samples as commonly applied).
Previous work commonly used a rate of 80%:20% for a training-test split, and
a 70%:20%:10% for a training-test-validation split. While the deep learning
community commonly use a training-test-validation split to detect overfitting
to the test set due to hyperparameter tuning, the UCDDL process replaces the
validation set with a user study in the next phase. This has two advantages:
First, the full data set can be used to train the model and test it based on the
test set. Second, the user study in the next phase can gather a validation set
with new participants which are usually larger than 10% of the data set. More
importantly, the model’s usability (and also the accuracy) can be evaluated in a
realistic scenario based on feedback from potential users. This is not possible
with a training-test-validation split which focuses only on the modeling aspect.

The goal of the training process is to achieve the highest accuracy on the test
set. The model is then deployed in the respective system (e.g. a mobile device in
this thesis) for the evaluation in the next phase.

5. Model Validation and Design Evaluation. This phase evaluates the system as
a whole with participants who did not participate in the data collection study
(Phase 3). The evaluation focuses on three aspects: (1) a model validation to
achieve the same results as the commonly used training-test-validation approach
(combined with training and test of the previous phase), (2) evaluating the model
usability (and optionally also the model error) in a realistic but controlled scenario

24 1 | Introduction


to focus on individual aspects, and (3) evaluating the system within a common
use case (as specified in Phase 2) to assess the usefulness of the system and the
perceived usability of the model in a uncontrolled scenario.

The model validation replaces the validation set based on similar tasks as used
in the data collection study. In particular, data is collected with the same tasks
which, at the same time, can also be used to introduce participants into the system.
This prepares them for the usability evaluation within realistic scenarios which
consists of a set of tasks that resemble a realistic use case. This set of tasks is
designed to be controlled enough to enable a focus on individual aspects of the
system (e.g., recognition accuracy and usability of certain classes of the model).
For instance, a set of tasks could be designed in a pre-defined order so that model
predictions can be compared with the order to determine the accuracy. To focus
on the perceived usability, tasks could also be designed to expect only one type
of input (i.e. one class). This enables to evaluate false positives for a certain
class while collecting qualitative feedback from the participants about the used
class. More complex outputs, such as regression, could employ additional sensors
such as high-precision motion capture systems as ground truth. For the usability
evaluation of the full system, participants use the prototype to solve tasks in a
fully functional environment (e.g., an application designed for a certain use case,
or even well-known applications). This step is less controlled and focuses on
the system’s usability and usefulness. This results in qualitative feedback and
quantitative measures such as the task completion time or success rate.

In summary, the evaluation in the UCDDL covers the model validation as
well as the usability aspect as described in the UCD process.

1.3 Research Context

The research leading to this thesis was carried out over the course of three
years (2016 – 2018) in the Socio-Cognitive Systems group at the Institute for
Visualization and Interactive Systems. It was additionally part of a project funded
in the Cluster of Excellence in Simulation Technology (SimTech) at the University
of Stuttgart. The presented research was inspired by collaborations, publications,
and discussions with many experts from within and outside the field of HCI.

1.3 | Research Context 25


Cluster of Excellence in Simulation Technology SimTech is an interdisciplinary
research association with more than 200 scientists from virtually all faculties
of the University of Stuttgart. A major part of the research was conducted in
the project network “PN7 - Reflexion and Contextualisation”1. The research
presented in this thesis underwent an examination in the form of a mid-term
presentation accompanied by Prof. Dr. Syn Schmitt from the Institute of Sports
and Exercise Science. Moreover, intermediate research results were presented at
the annual SimTech Status Seminar.

University of Stuttgart The research presented in this thesis was inspired by
collaborations with colleagues from the University of Stuttgart. With the scientific
expertise and technical knowledge from Patrick Bader, Thomas Kosch, and
Sven Mayer we published six publications which are all in the scope of this
thesis [123–125, 130, 132, 136]. Moreover, the collaborations resulted into further
publications with relevant topics but beyond the scope of this thesis [117, 128,
133, 155, 156, 158–160] and tutorials on “Machine Learning for HCI” organized
at national as well as international conferences [126, 134, 157]. Amongst others,
online magazines and communities such as Arduino2, hackster.io3, and open-
electronics.org4 reported on our prototypes presented in this work.

The research was further inspired by discussions with a broad range of stu-
dent peers and senior researchers at the doctoral consortium at the International
Conference on Human-Computer Interaction with Mobile Devices and Services
(MobileHCI 2016) [122] and the ACM CHI Conference on Human Factors in
Computing Systems (CHI 2018) [121]. In addition, collaborations with Patrick
Bader, Passant El.Agroudy, Tilman Dingler, Valentin Schwind, Alexandra Voit,
and Dominik Weber resulted in publications beyond the scope of this thesis [11,
12, 48, 89, 131, 137, 247].

1http://www.simtech.uni-stuttgart.de/en/research/networks/7/
2http://blog.arduino.cc/2018/10/19/

infinitouch-interact-with-both-sides-of-your-smartphone/
3http://blog.hackster.io/

dual-sided-smartphone-interaction-with-infinitouch-6362c4181fa2
4http://www.open-electronics.org/

infinitouch-is-the-first-fully-touch-sensitive-smartphone/

26 1 | Introduction

http://www.simtech.uni-stuttgart.de/en/research/networks/7/
http://blog.arduino.cc/2018/10/19/infinitouch-interact-with-both-sides-of-your-smartphone/
http://blog.arduino.cc/2018/10/19/infinitouch-interact-with-both-sides-of-your-smartphone/
http://blog.hackster.io/dual-sided-smartphone-interaction-with-infinitouch-6362c4181fa2
http://blog.hackster.io/dual-sided-smartphone-interaction-with-infinitouch-6362c4181fa2
http://www.open-electronics.org/infinitouch-is-the-first-fully-touch-sensitive-smartphone/
http://www.open-electronics.org/infinitouch-is-the-first-fully-touch-sensitive-smartphone/


External Collaborations Further research beyond the scope of this thesis was
conducted with external collaborators. This includes Katrin Wolf from the Ham-
burg University of Applied Sciences [129], Lewis Chuang from the Max Planck
Institute for Biological Cybernetics [160], Sarah Clinch, Nigel Davies, and Corina
Sas from the Lancaster University [131], as well as Agon Bexheti, Marc Lang-
heinrich, and Evangelos Niforatos from the Università della Svizzera italiana [48].

1.4 Thesis Outline

This thesis consists of seven chapters, the bibliography, and the appendix. We
present the results and evaluations of 12 empirical studies, an extensive review
of related work, as well as a discussion and summary of the findings in the
conclusion chapter. We structure the work as follows:

Chapter 1 - Introduction motivates the research in this thesis and gives an
overview about the research questions and the author’s contributions. We
further present the user-centered design process for deep learning which
we follow throughout this thesis.

Chapter 2 - Background provides an overview of the history of touch inte-
raction, an explanation of capacitive touch sensing, as well as an extensive
review of touch-based interaction techniques on mobile devices and beyond.

Chapter 3 - Hand Ergonomics for Mobile Touch Interaction describes the
results of two studies investigating the behavior and ergonomic constraints
of finger while holding a mobile device.

Chapter 4 - Hand-and-Finger-Awareness on Mobile Touchscreens presents
two models that use the raw data of capacitive touchscreens to recognize
the source of touch, and their evaluations within realistic use cases.

Chapter 5 - Hand-and-Finger-Awareness on Full-Touch Mobile Devices de-
velops a smartphone prototype with touch sensing on the whole device
surface and shows how fingers can be identified. Further, we discuss how
full-touch smartphones can solve recent touch input limitations.

1.4 | Thesis Outline 27


Chapter 6 - Improving Shortcuts for Text Editing applies the findings from
the previous chapters and presents four studies which cover all steps from
understanding shortcut use on keyboards, a gesture elicitation study, a
data collection study to train a gesture recognizer using deep learning, and
finally an evaluation study.

Chapter 7 - Conclusion and Future Work discusses the findings from the
previous chapters, summarizes them, and provides directions for further
research.

28 1 | Introduction


2
Background and Related Work

While touchscreens enable intuitive interactions, keyboards and mice as input
devices are still superior to touch input as they provide more input capabilities by
enabling the use of multiple fingers. In this thesis, we explore novel touch-based
interaction techniques which differentiate between individual fingers and hand
parts to solve limitations of recent mobile touch interaction. To understand the
technologies used in this thesis, this chapter provides an introduction to touch-
based interaction as well as its history and technical background. We further
review previous work in the domain of extending touch interaction and present
recent challenges of mobile touch interaction which we address in this thesis.

2.1 Background

Touchscreens are ubiquitous in our modern world. According to statista [285],
over 2.5 billion people own a smartphone with a touchscreen as the main interface.
People use smartphones for tasks which were previously exclusive to stationary
computers and in a wide range of scenarios such as while sitting, walking, en-
cumbered, or even during other tasks. The combination of input and output in

29


Figure 2.1: The first touchscreen as developed by E.A. Johnson. Image taken

from [106].

a single interface enable intuitive interaction through direct touch. Moreover,
touchscreens enable manufacturers to build compact and robust devices which
use nearly the whole front surface for input and output.

2.1.1 History and Development of Touch Interaction

The first finger-based touchscreen was invented in 1965 by E.A. Johnson [105]
who described a workable mechanism for developing a touchscreen. As with most
consumer devices nowadays, the presented prototype used capacitive sensing.
The inventor envisioned the invention to be used for air-traffic-control, such
as facilitating selections of call signs, flights, and executive actions [106, 184].
Figure 2.1 shows the display configuration for the touch interface. Five years later,
Samuel Hurst and his research group at the University of Kentucky developed the
first resistive touchscreen in 1970. In contrast to capacitive sensing methods as
invented by E.A. Johnson, resistive touchscreens were more durable back then, not
expensive to produce, and operation is not restricted to conductive objects such as
human skin or conductive pens. Nowadays, resistive touch sensing can be found

30 2 | Background and Related Work


mostly in public areas such as restaurants, factories, and hospitals. In 1972, the
first widely deployed touchscreen based on infrared light was developed [55], and
was deployed in schools throughout the united states. This technology employed
fingers interrupting light beams that ran parallel to the display surface.

In 1982, Nimish Mehta [162] developed the first multi-touch device which
used a frosted-glass panel with a camera behind it so that it could detect action
which are recognizable through black spots showing up on the screen. Gestures
similar to today’s pinch-to-zoom or manipulation through dragging were first
presented in a system by Krueger et al. [116]. Although the system was vision-
based and thus is not suitable for touch interaction, many of the presented gestures
could be readily ported to a two-dimensional space for touchscreens. One year
later, the first commercial PC with a touchscreen (Hewlett Packard HP-1501) was
released. The touchscreen is based on infrared sensing but was not well perceived
at that time as graphical user interfaces were not widely used. In 1984, Bob
Boie presented the first transparent multi-touch screen which used a transparent
capacitive array of touch sensors on top of a CRT screen. Similarly, Lee et
al. [138] developed a touch tablet in 1985 that can sense an arbitrary number of
simultaneous touch inputs based on capacitive sensing. Using the compression of
the overlaying insulator, the tablet is further capable of sensing the touch pressure.
Recent iPhones incorporate this input modality under the name Force Touch.

In 1993, the Simon Personal Communicator from IBM and BellSouth (see
Figure 2.2) was released, which was the first mobile phone with a touchscreen. Its
resistive touchscreen enabled features such as e-mail clients, a calendar, address
book, a calculator, and a pen-based sketchpad. In the same year, Apple Computer
released the MessagePad 100, a personal digital assistant (PDA) that can be
controlled with a stylus but without a call functionality. The success of PDAs
continued with the Palm Pilot by Palm Computing as the handwriting recognition
worked better for the users. However, in contrast to smartphones nowadays, all
these devices require the use of a stylus.

In 1999, FingerWorks, Inc. released consumer products such as the Tou-
chStream and the iGesture Pad that can be operated with finger inputs and ge-

1http:
//www.hp.com/hpinfo/abouthp/histnfacts/museum/personalsystems/0031/

2.1 | Background 31

http://www.hp.com/hpinfo/abouthp/histnfacts/museum/personalsystems/0031/
http://www.hp.com/hpinfo/abouthp/histnfacts/museum/personalsystems/0031/


Figure 2.2: Simon Personal Communicator, the first smartphone with a touchscreen

by IBM and BellSouth. Image taken from arstechnica1.

stures. The company was eventually acquired by Apple Inc. to contribute to the
development of the iPhone’s touchscreen and the Apple’s Multi-Touch trackpad.
Based on the work by Jun Rekimoto [194], Sony introduced a first flat input sur-
face in 2002 that provides two-dimensional images of the changes in the electric
field. This technology is known as mutual capacitive sensing and the electric
field changes represent low-resolution shapes of conductive objects touching the
sensor. In contrast to camera-based approaches, all elements are integrated into a
flat touch panel which enables the integration into mobile devices. Touchscreens
incorporated in smartphones nowadays are based on this technology.

In the subsequent years, new touch-based technologies were introduced but
these are not employed on smartphones due to space constraints. For example,
Jeff Han introduced multitouch sensing through frustrated total internal reflection
(FTIR) which is based on infrared (IR) LEDs and an IR camera below the touch
surface to sense touch input. This enables building high-resolution touchscreens
and is less expensive than other technologies. In 2008, the Microsoft Surface 1.0,
a table-based touchscreen, was released that integrated a PC and five near-infrared
cameras to sense fingers and objects placed on the display. Three years later,
the second version of the Microsoft Surface (now called Microsoft PixelSense)

32 2 | Background and Related Work


was released that is based on Samsung’s SUR40 technology. This technology
represents a 40-inch interactive touch display in which pixels can also sense
objects above it. This enables to build a less bulky tabletop without cameras
below the display and generates a 960×540 px touch image that can be used for
object tracking.

2.1.2 Capacitive Touch Sensing

Since the invention of the first touchscreen, a wide range of further touch sensing
technologies have been presented. While many of these approaches provide
a higher touch sensing resolution and expressiveness compared to the earlier
invented capacitive and resistive touchscreens, they are less suitable for mobile
devices due to their immobile setup. Amongst others, these technologies include
frustrated total internal reflection [77], surface acoustic waves [142], camera-
based touch sensing (e.g. RGB [225], depth [252]), infrared touch sensing [2],
and inductive touch sensing [43].

Due to their compact size, robustness, and responsiveness, capacitive tou-
chscreens are widely used in mobile devices nowadays. In particular, mobile
devices use projected capacitive touchscreens which sense touches with a higher
resolution than surface capacitance which is often used on larger surfaces with
four electrodes at each corner. Figure 2.3 sketches the functional principle of
a mutual capacitive touchscreen. Mutual capacitance is one of the two types
of the projected capacitance principle and is commonly used in recent mobile
devices [15]. The touch sensor itself consists of three layers; an electrode pattern
layer in the middle which is responsible for the actual touch sensing and two
protective layers. The touch sensor with all of its layers is transparent and placed
on top of the display unit such as a liquid crystal display (LCD). The electrode
pattern layer is connected to a touch controller and consists of conductive wires
made out of indium tin oxide (ITO) which is transparent and sketched on the
bottom left of Figure 2.3.

The controller measures the change of coupling capacitance between two
orthogonal electrodes, i.e. intersections of row and column pairs [50]. These
measurements result in a low-resolution finger imprint which is shown on the
bottom right of Figure 2.3 and referred to as a capacitive image [73, 99, 136, 156].

2.1 | Background 33


Protective CoverElectrode Pattern Layer

Glass Substrate

Electrical Field

Electrode
Layer X

Electrode
Layer Y

0 0 2 2 4 4 4 2 1

1 0 1 4 4 4 4 2 2

0 0 1 5 18 11 4 7 2

0 0 2 44 141 83 11 2 0

0 2 9 99 219 136 19 2 0

0 1 5 29 55 29 9 4 2

0 2 2 7 9 9 5 2 1

0 0 1 4 2 2 2 1 0

2 1 2 5 1 1 1 1 1

Capacitive Image
(representing the touch of a finger)

Figure 2.3: Components of a mutual capacitive touchscreen and the resulting ca-

pacitive image. Figure adapted and extended based on http://www.eizo.com/
library/basics/basic_understanding_of_touch_panel/.

Capacitive touchscreens of commodity smartphones comprise around 400 to 600
electrodes (e.g., 15×27 electrodes with each being 4.1×4.1mm on an LG Nexus
5). The touch controller translates the measurements into a 2D coordinate which
is then provided to the operating system (indicated as a red dot in the Figure).

While touch interaction on recent mobile devices is based solely on the 2D
coordinate of a touch (i.e. the red dot), the remaining information about a touch

34 2 | Background and Related Work

http://www.eizo.com/library/basics/basic_understanding_of_touch_panel/
http://www.eizo.com/library/basics/basic_understanding_of_touch_panel/


is omitted. In this thesis, we present a number of approaches which uses the
capacitive images of commodity mutual capacitive touchscreens in mobile devices
to infer the source of a touch such as different fingers and hand parts.

2.2 Related Work

Related work presented a wide range of novel interaction techniques to extend
the touch input vocabulary on mobile devices. Following the structure of this
thesis, we first describe the ergonomics and physical limitations of the hand
for interaction with mobile devices. Secondly, we describe interaction methods
that improve and extend the interaction with a touchscreen (on the front side)
on mobile devices. Lastly, we go one step further and review related work
that presents novel interaction methods based on touch input beyond the front
touchscreen (e.g., the back and edges of a device).

2.2.1 Hand Ergonomics for Mobile Touch Interaction

In contrast to stationary input devices such as a hardware keyboard and mouse,
users usually hold and interact with mobile devices simultaneously. This poses
a wide range of challenges. When using a smartphone in the prevalent single-
handed grip [54, 109, 110, 176], the same hand is used for holding and interacting
with the device. This limits the fingers’ range and generates unintended inputs due
to the continuous contact with the device. In the following, we review previous
work on ergonomics of the hand when holding a smartphone and supportive finger
movements which users perform during interaction.

Placement, Movement, and Range of Fingers

To inform the design of novel interaction methods on mobile devices, an under-
standing of finger placement, movement and their ranges is vital. A wide range of

2.2 | Related Work 35


heuristics have been proposed by designers over the years1,2,3,4,5. Previous work
further investigated the range of the thumb to inform the design of mobile user
interfaces [20]. Since BoD and edge input became more relevant in recent years,
all other fingers need to be investigated to inform the design of fully hand-and-
finger-aware interaction methods. While previous work showed where fingers are
typically placed when holding a smartphone [276], there is no work that studied
the areas reachable by all fingers on mobile devices. This thesis contributes to
this research area by studying finger placements, ranges and reachable areas of
all fingers on mobile devices.

An important basis to inform the placement of on-screen interaction elements
and on-device input controls is the analysis of areas on the device that can be
reached by the fingers. Bergstrom-Lehtovirta and Oulasvirta [20] modeled the
thumb’s range on smartphones to inform the placement of user interface elements
for one-handed interaction. To predict the thumb’s range, the model mainly
involves the user’s hand size and the position of the index finger which is assumed
to be straight (adducted). For the predicted range of the thumb, they introduced
the term functional area which is adapted from earlier work in kinesiology and
biomechanics. In these fields, possible postures and movements of the hand
are called functional space [118]. Thumb behavior was further investigated by
Trudeau et al. [231] who modeled the motor performance in different flexion
states. Park et al. [189] described the impact of touch key sizes on the thumb’s
touch accuracy while Xiong et al. [268] found that the thumb develops fatigue
rapidly when tapping on smaller targets.

Besides the thumb, previous work investigated the index finger during smartp-
hone interaction. Yoo et al. [276] conducted a qualitative study to determine the
comfortable zone of the index finger on the back of the device. This was done
without moving the finger and by asking users during the study. From a biomecha-
nical perspective, Lee et al. [139] investigated the practicality of different strokes

1https://www.uxmatters.com/mt/archives/2013/02/
how-do-users-really-hold-mobile-devices.php

2http://blog.usabilla.com/designing-thumbs-thumb-zone/
3http://scotthurff.com/posts/facebook-paper-gestures
4https://www.smashingmagazine.com/2016/09/

the-thumb-zone-designing-for-mobile-users/
5https://medium.com/@konsav/-55aba8ed3859

36 2 | Background and Related Work

https://www.uxmatters.com/mt/archives/2013/02/how-do-users-really-hold-mobile-devices.php
https://www.uxmatters.com/mt/archives/2013/02/how-do-users-really-hold-mobile-devices.php
http://blog.usabilla.com/designing-thumbs-thumb-zone/
http://scotthurff.com/posts/facebook-paper-gestures
https://www.smashingmagazine.com/2016/09/the-thumb-zone-designing-for-mobile-users/
https://www.smashingmagazine.com/2016/09/the-thumb-zone-designing-for-mobile-users/
https://medium.com/@konsav/-55aba8ed3859


for BoD interaction. Similarly, prior work found that using the index finger for
target selection on the BoD leads to a lower error rate than using the thumb for
direct touch [143, 256]. Wobbrock et al. [256] showed that both the thumb on
the front and index finger on the BoD perform similarly well in a Fitts’ law task.
Wolf et al. [260] found that BoD gestures are performed significantly different
than front gestures. Corsten et al. [40, 41] used BoD landmarks and showed that
the rear position of the index finger could be accurately transferred to the thumb
by pinching both fingers.

Since different grips can be used as an input modality [254], a wide range of
prior work sought an understanding of how users hold the phone while using it.
Eardly et al. [53, 54] explored hand grip changes during smartphone interaction
to propose use cases for adaptive user interfaces. They showed that the device
size and target distance affects how much users tilt and rotate the device to reach
targets on the touchscreen. Moreover, they investigated the effect of body posture
(e.g., while standing, sitting, and lying down) on the hand grip, and showed that
most grip movements were done while lying down followed by sitting and finally
standing [52].

Previous work in biomechanics looked into different properties of the hand.
Napier et al. [175] investigated two movement patterns for grasping objects which
they call precision grip and power grip. People holding objects with the power
grip use their partly flexed fingers and the palm to apply pressure on an object.
Sancho-Bru et al. [205] developed a 3D biomechanical hand model for power
grips and used it to simulate grasps on a cylinder. However, as smartphones are
not necessarily held in a power grip, this model cannot be applied to smartphone
interaction. Kuo et al. [118] investigated the functional workspace of the thumb by
tracking unconstrained motion. This is the space on the hand which is reachable
by the thumb. Brook et al. [26] introduced a biomechanical model of index finger
dynamics which enables the simulation of pinch and rotation movements. As
holding a smartphone and interacting with the touchscreen introduces additional
constraints to all fingers, these results cannot be applied to model the hand grip
and ergonomics.

2.2 | Related Work 37


Supportive Finger Movements

Although users intend to move only the thumb to perform single-handed input on a
front touchscreen, they unconsciously perform a wide range of further “supportive”
movements. These movements maintain the balance and grip on the device,
increase the reachability of the thumb on the display (e.g., through tilting [34] and
grip shifts [53, 54]), or are unavoidable due to the limited movement independence
of fingers (e.g., moving one finger also moves other fingers [76]). An important
basis to design BoD input controls that take unintended input into account is the
analysis of supportive micro-movements during common smartphone tasks.

Tilting the device is one type of supportive micro-movements which is used to
increase the thumb’s reachability on the display. Previous work found that users
tilt the device towards their thumb to reach farther distanced targets (e.g., at the
top left corner) and away from their thumb to reach targets at the bottom right
corner [34, 54]. Eardley et al. [52–54] referred to all movements which increase
the reachability as “grip shifts” and explored them for different device sizes
and tasks. Based on video recordings with manually identified key points and
accelerometer values, they quantified the number of grip shifts during common
smartphone tasks. They found that more grip shifts occurred with increasing
device sizes while the amount of tilt and rotation varied with grip types and phone
sizes. Moreover, they showed that the body posture (e.g., sitting and standing)
affects the device movements, suggesting that device sizes and body postures
need to be considered for exploring supportive micro-movements. While these
findings explain the device movements, no previous work investigated the actual
finger movements which could generate unintended input on the device surface.

The limited independence of finger movements causes another type of sup-
portive micro-movements. Previous work in biomechanics found that even when
asked to move just one finger, humans usually also produce motion in other
fingers [76]. The limited independence of the fingers is due to biomechanical
interconnections such as connected soft tissues [242] and motor units [207]. Mo-
reover, Trudeau et al. [231] found that the thumb’s motor performance varies
by the direction and device size during single-handed smartphone use while
the motor performance is generally greater for two-handed grips [230]. While

38 2 | Background and Related Work


Sancho-Bru [205] presented a biomechanical model of the hand for the power
grip [175], an application thereof is not possible for an investigation of supportive
micro-movements as smartphones are not used solely in a power grip.

One chapter of this thesis contributes to the understanding of supportive
micro-movements by studying how fingers on the rear move while interacting
with the front side.

2.2.2 Novel Touch-Based Interaction Methods

Recent touchscreens are designed to register two-dimensional locations of touches.
These locations are provided to the application layer of the operating system to
enable interaction with the user interface. Besides the two-dimensional location
of touches, a wide range of touch properties are available that can be used to
increase the input vocabulary of touch interaction. Well-known examples from
recent operating systems are the long-press that leverages the dwell time and
gestures that are based on subsequent touch locations. While these additions
are beneficial, they require additional execution time. Moreover, the touch input
vocabulary is still limited when compared to other input devices such as hardware
keyboards or computer mouses. In the following, we describe related work that
improves touch input using data from touchscreens and their mobile device.

Extending Touch Interaction on Mobile Touchscreens

Previous work presented a wide range of approaches to extend the touch input
vocabulary on mobile touch-based devices. In the following, we describe two
common approaches that do not require additional sensors beyond a touchscreen.
This includes approaches that are (1) based solely on two-dimensional touch
locations available on all touchscreen technologies, and (2) based on the raw data
of capacitive touchscreens representing low-resolution fingerprints of touches.

Using the Two-Dimensional Location of Touches Approaches to extend the
touch input vocabulary based on only the two-dimensional location of touch
inputs can readily be deployed on any touch-based mobile device. Since all tou-
chscreens already provide the two-dimensional location of touches, no additional
information and sensors are required.

2.2 | Related Work 39


Single taps are mostly used for selection-based interaction such as selecting
an action assigned to a button. Gestures play an important role in making user
interfaces more intuitive (e.g., moving objects by dragging them) and in provi-
ding shortcuts for a faster access to frequently used functions (e.g., launching
applications [190], searching [141]). A gesture can be performed by moving
the finger while in contact with the touchscreen. This generates a trajectory of
two-dimensional locations of touches that are then interpreted as gestures by
the system. Previous work in HCI invested a sheer amount of effort to improve
gesture-based interfaces, such as through methodologies for gesture design [237,
238, 255, 256], simple gesture recognizers for fast prototyping purposes [5, 235,
257], improving gesture memorability [173, 277], and through design guidelines
for gesture designs [4, 278]. However, gestures have the disadvantage that they
require additional execution time as well as enough screen space for the execution.
Moreover, a comprehensive set of gestures would lead to conflicts (e.g., uninten-
ded activations) and the accuracy of gesture recognizers would decrease due to
ambiguity errors.

Previous work proposed a wide range of interaction methods to enrich touch
interaction beyond gesture shapes and types. Amongst others, a gesture starting
from the device’s bezel can be distinguished from a gesture starting on the
touchscreen itself. This differentiation was used in previous work to provide
shortcuts to the clipboard [200] and to improve one-handed interaction by offering
reachability features [112].

Moreover, researchers implemented simple heuristics to use the finger orien-
tation as an input dimension. Roudaut et al. [202] presented MicroRolls, a
micro-gesture that extends the touch input vocabulary by rolling (i.e., changing
pitch and roll angle of the finger) the finger on the touchscreen. Since touchscreens
translate touch contact areas to two-dimensional locations based on the area’s
centroid [23, 98, 202], a trajectory of two-dimensional locations is generated
through the changing contact area induced by finger rolling. MicroRolls uses
this trajectory to recognize rolling movements with accuracies of over 95%. Ho-
wever, this interaction techniques cannot be used during a drag action since the

40 2 | Background and Related Work


segmentation of the gesture requires down and up events. Thus, Bonnet et al. [23]
presented ThumbRock which improves MicroRolls by additionally using the size
of the contact area as reported by Apple iOS.

Using the Raw Data of Capacitive Touchscreens Nowadays, the majority of
touchscreens incorporated in mobile devices are based on mutual capacitive
sensing. Taking the measurements of all electrodes of the touchscreen, a two-
dimensional image (referred to as capacitive image [73, 99, 136, 156]) can
be retrieved as shown in Section 2.1.2. Previous work predominantly used an
LG Nexus 5 since its touch controller (Synaptics ClearPad 3350) provides a
debugging bridge to access the 8-bit capacitive images with a resolution of
27×15 px at 6.24ppi. While capacitive images can be used to recognize body
parts for authentication purposes [73, 99], previous work also used the resulting
area for interaction methods. Amongst others, Oakley et al. [182] used the area of
touches on smartwatches to provide shortcuts to pre-defined functions. Similarly,
Boring et al. [24] used the size of the contact area to enable one-handed zooming
and panning.

To extend the touch input performed with fingers, researchers developed
machine learning models that infer additional properties based on the capacitive
images. Amongst others, machine learning models can be used to estimate the
pitch1 and yaw2 angle of a finger touching the display [156, 265]. In contrast
to the approach on the tabletop [244], machine learning was necessary as no
high-resolution contact area is available. Moreover, Gil et al. [63] used basic
machine learning techniques to identify fingers touching the display. However,
they showed that a usable accuracy can only be achieved with exaggerated poses
on smartwatches so that each finger touched with a distinct angle. Recent Huawei
devices incorporate KnuckleSense, an additional input modality that differentiates
between touches made by fingers and knuckles. This technology is based on
FingerSense, a proprietary technology by Qeexo3 of which no technical details
are publicly available.

1Pitch angle: Angle between the finger and the horizontal touch surface.
2Yaw angle: Angle between the finger and the vertical axis.
3http://qeexo.com/fingersense/

2.2 | Related Work 41

http://qeexo.com/fingersense/


Extending Touch Interaction through Additional Sensors

Previous work and smartphone manufacturers used additional built-in sensors
to augment touch input. Amongst others, this includes sensors to measure the
applied force, microphone recordings, inertial measurement units (IMUs), and
pre-touch sensing. Moreover, we give an overview of external sensors that were
used in previous work to extend touch input.

Force and Pressure Pressure input offers an additional input dimension for
mobile touch interaction. Since interaction can be performed without moving
the finger, this input dimension benefits user interfaces on small displays and
situations in which finger movements are not desirable. Using the force applied on
the touchscreen of a mobile device was first used by Miyaki and Rekimoto [167] to
extend the touch input vocabulary. Based on force sensitive resistors between the
device and a back cover, they measured the changing pressure levels to prototype
one-handed zooming on mobile devices. Stewart et al. [223] investigated the
characteristics of pressure input on mobile devices and found that a linear mapping
of force to value worked the best for users. Researchers further used the shear
force, the force tangential to the display’s surface, to extend pressure input.
Amongst others, Harrison and Hudson [80] developed a touchscreen prototype that
uses the shear force for interaction while Heo and Lee [90] augment touch gestures
by sensing normal and tangential forces on a touchscreen. Beyond the touchscreen,
force can also be used for twisting the device as an input technique [68, 69, 119].

With the iPhone 6s, Apple introduced the pressure input dimension under the
name Force Touch. Based on force sensors below the touchscreen or a series of
electrodes on the screen curvature (Apple Watch), they used the additional input
dimension to enable users to perform secondary actions such as opening a context
menu or peeking into files. To estimate the force of a touch without additional
sensors, Heo and Lee [91] used the built-in accelerometer and position data of the
touchscreen.

Acoustics The sound resulting from an object’s impact on the touchscreen
can be used to differentiate between the source of input. By attaching a medical
stethoscope to the back of a smartphone, Harrison et al. [82] showed the feasibility

42 2 | Background and Related Work


of differentiating between different parts of the finger (e.g., pad, tip, nail, or
knuckle) as well as objects (e.g., stylus). Lopes et al. [145] used a similar
approach and augmented touch interaction based on a contact microphone to
sense vibrations. With this, they showed that different hand placements on the
touch surface (e.g., tap with a finger tip, knock, slap with the flat hand, and a
punch) can be reliably recognized. Similarly, Paradiso et al. [188] used four
contact piezoelectric pickups at the corners of a window to differentiate between
taps and knocks.

In general, approaches based on acoustic sensing have shown to be reliable
to identify the source of touch. However, since microphones are required to
continuously capture the acoustics, these approaches are prone to errors in noisy
situations. Thus, they are not suitable for interaction on mobile devices such as
smartphones and tablets.

Physical Device Movement A wide range of previous work combined touch
input with the built-in accelerometer of mobile devices. Hinckley et al. [92]
introduced the terminology of touch-enhanced motion techniques which combine
information of a touch and explicit device movements sensed by the IMU. For
example, a touch and a subsequent tilt sensed by the accelerometer can be used
to implement one-handed zooming while holding an item on the touchscreen
followed by shaking the device can be used to offer a shortcut to delete files.
Similar gestures were explored especially for interaction with wall displays using
a mobile phone. Hassan et al. [86] introduced the Chucking gesture in which
users tap and hold an icon on the touchscreen, followed by a toss measured by the
accelerometer to transfer the file to the wall display. To transfer items between
public displays using a mobile phone, Boring et al. [25] proposed a similar gesture
in which users hold an object on the touchscreen and move the mobile devices
between displays. Researchers also used the built-in accelerometer to enhance
text entry on mobile devices. This includes the use of the device orientation to
resolve ambiguity on a T9 keyboard [249] and the improvement of one-handed
gestural text input on large mobile devices [273].

In contrast, motion-enhanced touch techniques combine touch input and
the implicit changes of the accelerometer values to infer touch properties. For

2.2 | Related Work 43


example, a soft tap can be differentiated from a hard tap through the impact of
the touch. Going one step further, Seipp and Devlin [213] used touch position
and accelerometer values to develop a classifier that determines whether users are
using the device in a one-handed grip with the thumb or in a two-handed grip with
the index finger. With this, they achieved an accuracy of 82.6%. Similarly, Goel
et al. [66] used the touch input and device rotation to infer the hand posture (i.e.,
left/right thumb, index finger) with an accuracy of 87%. By attaching a wearable
IMU to the user’s wrist, Wilkinson et al. [251] inferred the roll and pitch angle of
the finger, and the force of touches described by the acceleration data.

Proximity Touch Sensing Marquardt et al. [150] proposed the continuous in-
teraction space, which was among the first models that describe the continuity
between hover and on-screen touches. They proposed a number of use cases that
enables users to combine touch and hover gestures anywhere in the space and
naturally move between them. Amongst others, this includes raycasting gestures
to extend reachability, receiving hints through hovering over UI elements [37],
and avoiding occlusion by continuing direct touch actions in the space above.
Spindler et al. [222] further proposed to divide the interaction above the table-
top into multiple layers while Grossman [71] explored hover interaction for 3D
interaction.

Hover information can also be used to predict future touch locations. Xia
et al. [264] developed a prediction model to reduce the touch latency of up to
128ms. To avoid the fat-finger problem, Yang et al. [270] used a touch prediction
to expand the target as the finger approaches. Similarly, Hinckley et al. [93]
explored hover interaction on mobile devices and proposed to blend in or hide
UI components depending on whether a finger is approaching or withdrawn (e.g.,
play button in a video player appears when the finger is approaching). Since a
finger can also be sensed above the display, Rogers et al. [198] developed a model
for estimating the finger orientation based on sensing the whole finger on and
above a touchscreen.

Previous work presented different approaches to enable proximity touch
sensing. The SmartSkin prototype presented by Rekimoto [194] calculates the
distance between hand and surface by using capacitive sensing and a mesh-shaped

44 2 | Background and Related Work


antenna. Annett et al. [3] presented Medusa which is a multi-touch tabletop with
138 proximity sensors to detect users around and above the touchscreen. On the
commercial side, devices such as the Samsung Galaxy S4 and the Sony Xperia
Sola combine mutual capacitance (for multi-touch sensing on the touchscreen),
and self-capacitance (generates a stronger signal but only senses a single finger)
to enable hover interaction1.

Fiducial Markers and Capacitive Coupling A large body of work coupled exter-
nal sensors and devices with touchscreens to extend the touch input vocabulary.
The focus lies especially on identifying the object touching the display, such as
different fingers, users, and items.

A common approach to identify objects on the touchscreen is to use fiducial
markers. These markers assign a unique ID to an object through a uniquely
patterned tag in the form of stickers [108, 195], NFC tags [240, 241], RFID
tags [183], unique shapes [85], and through rigid bodies of conductive areas
attached to objects (“capacitance tags”) [194]. While these approaches are only
suitable for objects due to the attachment of tags, previous work investigated
the use of capacitive coupling (i.e., placing an electrode between object and
the ground to change the electric field measured by the touchscreen) to reliably
identify users [243] and authenticate them with each touch [100]. Similarly,
DiamondTouch [47] identifies users based on an electric connection to the chair
they are sitting on while Harrison et al. [81] used Swept Frequency Capacitive
Sensing (SFCS) which measures the impedance of a user to the environment
across a range of AC frequencies. Using the same technology, Sato et al. [206]
turned conductive objects to touch-sensitive surfaces that can differentiate between
different grips (e.g., touch, pinch, and grasp on a door knob).

Active Sensors To identify different fingers on the display, previous work used
a wide range of different sensors. Approaches that achieved high accuracies
include the use of IR sensors [74, 75] and vibration sensors [152] mounted on
different fingers. Further approaches include electromyography [19], gloves [149]

1https://www.theverge.com/2012/3/14/2871193/
sony-xperia-sola-floating-touch-hover-event-screen-technology

2.2 | Related Work 45

https://www.theverge.com/2012/3/14/2871193/sony-xperia-sola-floating-touch-hover-event-screen-technology
https://www.theverge.com/2012/3/14/2871193/sony-xperia-sola-floating-touch-hover-event-screen-technology


and RFID tags attached to the fingernail [239]. To avoid instrumenting users with
sensors, previous work also used a combination of cameras attached to a mobile
device and computer vision to identify fingers [245, 284]. For example, Zheng et
al. [284] used the built-in webcam of laptops to identify fingers and hands on the
keyboard. Using depth cameras such as the Microsoft Kinect provides additional
depth information for finger identification. Amongst others, these were used
by Murugappan [172] and Wilson [252] to implement touch sensors. The Leap
Motion1 is a sensor device that uses proprietary algorithms to provide a hand
model with an average accuracy of 0.7mm [248]. Colley and Häkkilä [38] used a
Leap Motion next to a smartphone to evaluate finger-aware interaction. While
these are all promising approaches, they are not yet integrated into mass-market
devices since wearable sensors are limiting the mobility while sensors attached to
the device (e.g., cameras) are increasing the device size.

Extending Touch Interaction on Tabletops

Previous work presented a wide range of novel interaction methods based on
images of touches provided by touchscreens. Researchers predominantly focused
on tabletops that provide high-resolution images of touches [8, 56, 62] through
technologies such as infrared cameras below the touch surface or frustrated total
internal reflection [77]. The Microsoft PixelSense is a common example and
provide high-resolution images with a resolution of 960×540 px (24ppi). This
enabled a wide range of novel interaction methods including the development
of widgets triggered by hand contact postures [154], using the forearm to access
menus [114], using the contact shape to extend touch input [18, 30], and gestures
imitating the use of common physical tools (e.g., whiteboard eraser, eraser, camera,
magnifying glass) to leverage familiarity [83]. The latter was commercialized by
Qeexo as TouchTools2.

Based on a rear-projected multi-touch table with a large fiber optic plate as
the screen, Holz and Baudisch developed a touchscreen that senses fingerprints
for authentication [96]. This is possible due to a diffuse light transmission
while the touchscreen has a specular light reflection. Other approaches for user

1https://www.leapmotion.com/
2http://qeexo.com/touchtools/

46 2 | Background and Related Work

http://https://www.leapmotion.com/
http://qeexo.com/touchtools/


PPPPType
Position Front side Back side Top side Bottom side Left side Right side

Touch

Fingerprint scanner
Secondary screen j

Hardware buttons
(e.g., back, home)

Fingerprint scanner
BoD Touch a, j [16, 46]
Heart rate sensor f [151]

BoD touchscreen m

- - - Edged display b

Buttons
Hardware keyboard b

Home/Menu button c

Back/Recent button c

BoD Button d

Volume button l
Power

button e -
Volume buttons
Bixby button f

Power button
Volume buttons
Shutter button g

Slide - - - - Silent switch e -
Pressure Force Touch [272] - - - Side pressure h [59, 221]
Scrolling Trackball i LensGesture [266] - - - Scrolling wheel b

Tapping - BoD taps [197] Edge taps [161]

Misc

Front camera
Front speaker
Light sensor

Distance sensor
Notification LED

Back camera
Back speaker

Torchlight
E-ink display k

Microphone
Audio port
USB port g

Microphone
Speaker

USB port
Audio port

- -

a OPPO N1, b RIM BlackBerry 8707h, c HTC Tattoo, d LG G-Series, e iPhone 5,
f Samsung Galaxy S8, g Nokia Lumia 840, h HTC U11, i Nexus One, j LG X,

k YotaPhone 2, l Asus Zenfone, m Meizu Pro 7.

Table 2.1: Types of interaction controls beyond the touchscreen that are presented

in prior work and in recent or past smartphones. While some are not intended for

interaction initially (e.g., camera), these sensors could still be used for interaction in the

future, e.g. [266].

identification on tabletops are based on users’ back of the hand captured by a
top-mounted camera [193], by their hand geometry [22, 208], their shoes based on
a camera below the table [196], through personal devices [1], tagged gloves [148],
finger orientations [45, 280], IR light pulses [163, 199], and through capacitive
coupling [47, 243].

2.2.3 Interacting with Smartphones Beyond the Touchscreen

Since users are holding the smartphone during the interaction, the touchscreen on
the front is not the only surface that could be used for input. Previous work and
smartphone manufacturers presented a wide range of input mechanisms beyond
the touchscreen on the device surface. While the power and volume buttons are
integral parts of a smartphone nowadays, we describe further input mechanisms
in the following.

2.2 | Related Work 47


On-Device Input Controls

Previous work and manufacturers presented a broad range of input controls for
smartphones of which we provide an overview in Table 2.1. We categorized them
by their location on the device, and by the expected type of input.

Current smartphones such as the iPhone 7 and Samsung Galaxy S8 incorpo-
rate fingerprint sensors below the touchscreen or on the back of the device. These
are mainly used for authentication purposes but can also recognize directional
swipes that act as shortcuts for functions such as switching or launching applica-
tions. Previous work envisioned different functions that can be triggered using
a fingerprint sensor [185]. Due to a small number of devices that support any
form of interaction on the rear, researchers presented different ways to use built-in
sensors for enabling BoD interaction, including the accelerometer [140, 197] to
recognize taps and the back camera to enable swipe gestures [266]. Previous
work also presented a number of smartphone prototypes that enable touch input
on the whole device surface, including the front, back and the edges [127, 132,
168]. This enables a wide range of use cases which includes touch-based authenti-
cation on the rear side to prevent shoulder surfing [46], improving the reachability
during one-handed smartphone interaction [133], 3D object manipulation [10,
214], performing user-defined gesture input [215] and addressing the fat-finger
problem [16]. Recently, Corsten et al. [39] extended BoD touch input with a
pressure modality by attaching two iPhones back-to-back.

Before HTC recently introduced Edge Sense, pressure as an input modality
on the sides of the device have been studied in previous work [59, 95, 221, 253] to
activate pre-defined functions. Legacy devices such as the Nexus One and HTC
Desire S provide mechanical or optical trackballs below the display for selecting
items as this is difficult on small displays due to the fat-finger problem [16].
As screens were getting larger, trackballs became redundant and were removed.
Similarly, legacy BlackBerry devices incorporated a scrolling wheel on the right
side to enable scrolling.

For years, smartphones featured a number of button controls. Amongst others,
this includes a power button, the volume buttons, as well as hardware buttons
such as the back, home and recent buttons on Android devices. As a shortcut
to change the silent state, recent devices such as the iPhone 7 and OnePlus 5

48 2 | Background and Related Work


feature a hardware switch to immediately mute or unmute the device. Moreover,
the Samsung Galaxy S8 introduced an additional button on the left side of the
device as a shortcut to the device assistant while other devices incorporate a
dedicated camera button. Since a large number of hardware buttons clutter the
device, previous work used the built-in accelerometer to detect taps on the side of
the device [161].

Back-of-Device Prototyping Approaches

The simplest and most common approach for BoD interaction is to attach two
smartphones back-to-back [39, 46, 214, 261, 262]. However, this approach
increases the device thickness which negatively affects the hand grip and inte-
raction [133, 226]. This is detrimental for studies that observe the hand behavior
during BoD interaction, and could lead to muscle strain. To avoid altering the de-
vice’s form factor, researchers built custom devices that resemble smartphones [10,
33, 228]. However, these approaches mostly lack the support of an established
operating system so that integrating novel interactions into common applications
becomes tough. As a middle ground, researchers use small additional sensors
that barely change the device’s form factor. These include 3D-printed back cover
replacements to attach a resistive touch panel [133], and custom flexible PCBs
with 24 [168, 169] and 64 [33] square electrodes. However, neither the panel size
nor the resolution is sufficient to enable precise finger-aware interactions such as
gestures and absolute input on par with state-of-the-art touchscreens.

Beyond capacitive sensing, researchers proposed the use of inaudible sound
signals [174, 203, 246], high-frequency AC signals [283], electric field tomo-
graphy [282], conductive ink sensors [67], the smartphone’s camera [263, 266],
and other built-in sensors such as IMUs and microphones [70, 201, 279]. While
these approaches do not increase device thickness substantially, their raw data lack
details for precise interactions or inferring the touching finger or hand part. While
using flexible PCBs as presented in previous work is a promising approach, the
resolution is not sufficient. Further, previous work used proprietary technologies
so that other researchers cannot reproduce the prototype to investigate interactions
on such devices. There is no previous work that presents a reproducible (i.e., uses
commodity hardware) full-touch smartphone prototype.

2.2 | Related Work 49


2.3 Summary

In this chapter, we discussed the background and previous work on mobile
touch interaction. We started this chapter with the history of touch interaction
and background of capacitive touch sensing, which forms the foundation of the
technical parts of this work. Moreover, we reviewed previous work with a focus
on extending mobile touch interaction by hand-and-finger-awareness.

Following the structure of this thesis, we first reviewed previous work on
hand ergonomics for mobile touch interaction. Previous work investigated the
range of the thumb for single-handed touch interaction to inform the design of
touch-based user interfaces. However, there is neither work that does the same for
all other fingers nor the area in which fingers can move without a grip change. An
understanding thereof is vital to inform the design of fully hand-and-finger-aware
input methods especially on fully touch sensitive smartphones. We investigate
the areas which can be reached by all fingers without a grip change and their
maximum range by addressing RQ1. In addition to the reachability aspect, the
fingers on the back move unintentionally, amongst others, to maintain a stable
grip [52, 54], increase the thumb’s range [34, 54], or as a consequence of the
limited independence between the finger movements [76]. These movements
cause unintended inputs on fully touch sensitive smartphones which frustrate users
and renders all BoD input techniques ineffective. Ideally, BoD input controls need
to be placed so that they are reachable without a grip change but also in a way
which minimizes unintended input. This requires an investigation of supportive
micro-movements, their properties, as well as the areas in which they occur. We
address this with RQ2.

In the second part of the related work, we reviewed different approaches to
extend the touch input vocabulary. A wide range of approaches use different
sensors to infer additional properties of a touch. For example, this includes
the finger orientation, pressure, shear force, size of the touch area, as well as
identifying the finger or part of the hand which performed the touch. However,
the presented approaches have practical disadvantages which affect the usability,
convenience, and mobility. Input techniques which infer additional properties
of a touch (e.g., finger orientation, pressure, or size of touch area) extends the

50 2 | Background and Related Work


input vocabulary and its expressiveness. However, they also pose limitations since
specific finger postures may now trigger unwanted actions. In contrast, input
techniques which differentiate between the source of input (e.g., identifying indi-
vidual fingers and hand parts) do not interfere with the main finger for interaction
and thus do not hav