Hand-and-Finger-Awareness for Mobile Touch Interaction using Deep Learning Von der Fakultät für Informatik, Elektrotechnik und Informationstechnik und dem Stuttgart Research Centre for Simulation Technology der Universität Stuttgart zur Erlangung der Würde eines Doktors der Naturwissenschaften (Dr. rer. nat.) genehmigte Abhandlung Vorgelegt von Huy Viet Le aus Nürtingen Hauptberichter: Prof. Dr. Niels Henze Mitberichter: Prof. Antti Oulasvirta Mitprüfer: Jun.-Prof. Dr. Michael Sedlmair Tag der mündlichen Prüfung: 28.03.2019 Institut für Visualisierung und Interaktive Systeme der Universität Stuttgart 2019 2 Zusammenfassung Mobilgeräte wie Smartphones und Tablets haben mittlerweile Desktop Compu- ter für ein breites Spektrum an Aufgaben abgelöst. Nahezu jedes Smartphone besitzt einen berührungsempfindlichen Bildschirm (Touchscreen), welches die Ein- und Ausgabe zu einer Schnittstelle kombiniert und dadurch eine intuitive Interaktion ermöglicht. Mit dem Erfolg sind Anwendungen, die zuvor nur für Desktop Computer verfügbar waren, nun auch für Mobilgeräte verfügbar. Dieser Wandel steigerte die Mobilität von Computern und erlaubt es Nutzern dadurch Anwendungen auch unterwegs zu verwenden. Trotz des Erfolgs von Touchscreens sind traditionelle Eingabegeräte, wie Tastatur und Maus, aufgrund ihrer Eingabemöglichkeiten immer noch überlegen. Eine Maus besitzt mehrere Tasten, mit denen verschiedene Funktionen an dersel- ben Zeigerposition aktiviert werden können. Zudem besitzt eine Tastatur mehrere Hilfstasten, mit denen die Funktionalität anderer Tasten vervielfacht werden. Im Gegensatz dazu beschränken sich die Eingabemöglichkeiten von Touchscreens auf zweidimensionale Koordinaten der Berührung. Dies bringt einige Herausfor- derungen mit sich, die die Benutzerfreundlichkeit beeinträchtigen. Unter anderem sind Möglichkeiten zur Umsetzung von Kurzbefehlen eingeschränkt, was Sheider- mans goldene Regeln für das Interface Design widerspricht. Zudem wird meist nur ein Finger für Eingabe verwendet, was die Interaktion verlangsamt. Weitere Herausforderungen, wie das Fat-Finger Problem und die limitierte Erreichbarkeit 3 auf großen Geräten, tragen mit Unbequemlichkeiten bei. Neue berührungsba- sierte Interaktionsmethoden werden benötigt, um die Eingabemöglichkeiten auf Touchscreens zu erweitern und die Eingabe mit mehreren Fingern, wie es bei traditionellen Eingabegeräten üblich ist, zu ermöglichen. In dieser Arbeit wird untersucht, wie es einzelnen Fingern und Teile der Hand ermöglicht werden kann, Eingaben auf einem mobilen Gerät zu tätigen und zwischen deren Eingaben zu unterscheiden. Dieses Konzept wird als “Hand-und- Finger-bewusste” Interaktion bezeichnet. Durch die Erkennung von Hand und Finger können einzelnen Fingern und Teile der Hand verschiedene Funktionen zugewiesen werden, was die Eingabemöglichkeit erweitert und viele Herausforde- rungen der Touch Interaktion löst. Des Weiteren ermöglicht die Anwendung des Konzepts der “Hand-und-Finger-bewussten” Interaktion auf die komplette Gerä- teoberfläche nun auch die Verwendung der hinteren Finger zur Eingabe, die bisher nur das Gerät hielten. Dies addressiert weitere Herausforderungen der Touch Interaktion und bietet viele Möglichkeiten zur Realisierung von Kurzbefehlen. Diese Dissertation enthält die Ergebnisse aus zwölf Studien, welche sich auf die Design Aspekte, die technische Realisierbarkeit und die Benutzerfreundlich- keit der “Hand-und-Finger-bewussten” Interaktion fokussieren. Im ersten Schritt wird die Ergonomie und das Verhalten der Hand untersucht, um die Entwicklung neuer Interaktionstechniken zu inspirieren. Anschließend wird erforscht, wie gut einzelne Finger und Teile der Hand mit Hilfe von Deep Learning Techniken und Rohdaten von kapazitiven Sensoren erkannt werden können. Dabei wird sowohl ein einzelner kapazitiver Bildschirm, als auch ein Gerät, das rundum Berührungen registriert, verwendet. Basierend darauf präsentieren wir vier Studien, die sich damit befassen Kurzbefehle von Computer-Tastaturen auf mobile Geräte zu brin- gen, um die Benutzerfreundlichkeit von Textverarbeitung auf Mobilgeräten zu verbessern. Wir folgen dabei dem angepassten benutzerzentriertem Designprozess für die Anwendung von Deep Learning. Der Kernbeitrag dieser Dissertation erstreckt sich von tieferen Einsichten zur Interaktion mit verschiedenen Fingern und Teilen der Hand, über einen tech- nischen Beitrag zur Identifikation der Berührungsquelle mit Hilfe von Deep Learning Techniken, bis hin zu Ansätzen zur Lösung der Herausforderungen mobiler Berührungseingabe. 4 Abstract Mobile devices such as smartphones and tablets have replaced desktop computers for a wide range of everyday tasks. Virtually every smartphone incorporates a touchscreen which enables an intuitive interaction through a combination of input and output in a single interface. Due to the success of touch input, a wide range of applications became available for mobile devices which were previously exclusive to desktop computers. This transition increased the mobility of computing devices and enables users to access important applications even while on the move. Despite the success of touchscreens, traditional input devices such as keybo- ard and mouse are still superior due to their rich input capabilities. For instance, computer mice offer multiple buttons for different functions at the same cur- sor position while hardware keyboards provide modifier keys which augment the functionality of every other key. In contrast, touch input is limited to the two-dimensional location of touches sensed on the display. The limited input capabilities slow down the interaction and pose a number of challenges which affect the usability. Among others, shortcuts can merely be provided which affects experienced users and contradicts Shneiderman’s golden rules for interface design. Moreover, the use of mostly one finger for input slows down the interaction while further challenges such as the fat-finger problem and limited reachability add ad- ditional inconveniences. Although the input capabilities are sufficient for simple applications, more complex everyday tasks which require intensive input, such 5 as text editing, are still not widely adopted yet. Novel touch-based interaction techniques are needed to extend the touch input capabilities and enable multiple fingers and even parts of the hand to perform input similar to traditional input devices. This thesis examines how individual fingers and other parts of the hand can be recognized and used for touch input. We refer to this concept as hand-and- finger-awareness for mobile touch interaction. By identifying the source of input, different functions and action modifiers can be assigned to individual fingers and parts of the hand. We show that this concept increases the touch input capabilities and solves a number of touch input challenges. In addition, by applying the concept of hand-and-finger-awareness to input on the whole device surface, previously unused fingers on the back are now able to perform input and augment touches on the front side. This further addresses well-known challenges in touch interaction and provides a wide range of possibilities to realize shortcuts. We present twelve user studies which focus on the design aspects, technical feasibility, and the usability of hand-and-finger-awareness for mobile touch interaction. In a first step, we investigate the hand ergonomics and behavior during smartphone use to inform the design of novel interaction techniques. Afterward, we examine the feasibility of applying deep learning techniques to identify individual fingers and other hand parts based on the raw data of a single capacitive touchscreen as well as of a fully touch sensitive mobile device. Based on these findings, we present a series of studies which focus on bringing shortcuts from hardware keyboards to a fully touch sensitive device to improve mobile text editing. Thereby, we follow a user-centered design process adapted for the application of deep learning. The contribution of this thesis ranges from insights on the use of different fingers and parts of the hand for interaction, through technical contributions for the identification of the touch source using deep learning, to solutions for addressing limitations of mobile touch input. 6 Acknowledgements Over the past three years, I had one of the best times of my life working together with a number of amazing colleagues and friends who inspired me a lot. Without their support, this work would never have been possible. First and foremost, I would like to thank my supervisor Niels Henze who inspired my work and always supported me in the best possible ways to achieve my goals. Without his support, I would have never came this far. I further thank my committee Antti Oulasvirta, Michael Sedlmair, and Stefan Wagner for the great and inspiring discussions. Discussions with Syn Schmitt in the SimTech milestone presentation, and a number of student peers and mentors in doctoral consortia at international conferences further shaped my thesis. I would also like to thank Albrecht Schmidt for all his great support which even goes beyond research. Moreover, I thank Andreas Bulling for the opportunity to stay another five months to finalize my thesis. Before my time as a PhD student, I had the great honor to meet a number of awesome people who introduced me into the world of Human-Computer Inte- raction research. I thank Alireza Sahami Shirazi for his outstanding supervision during my bachelor’s thesis. His inspiration and recommendations played a huge role in getting me into HCI research. I further thank Tilman Dingler for his exceptional support and organization which provided me with the opportunity to write my master’s thesis at the Lancaster University. During my time in Lancas- 7 ter, I had a great and memorable time working with Corina Sas, Nigel Davies, and Sarah Clinch. I further thank Mateusz Mikusz who helped me finding an accommodation and ensured that everything was fine. I had the great pleasure to work with amazingly helpful and skilled colleagues who shaped my time as a PhD student. I thank my incredible office mates Domi- nik Weber, Hyunyoung Kim, and Nitesh Goyal for all the inspiring discussions and for bearing the time with me while I typed on my mechanical keyboard. I am further thankful for all the collaborations which taught me how to write papers, build prototypes, and supervise students. In particular, I thank Sven Mayer for sharing his research experiences and for all the great work together which resulted in a lot of publications. I further thank Patrick Bader for sharing his endless knowledge on hardware prototyping and algorithms. I also thank Francisco Kiss for helping me with his extensive knowledge in electrical engineering and soldering skills. I am further thankful to Katrin Wolf for inspiring me a lot with her experiences in mobile interaction, and Lewis Chuang for the valuable collaboration. A PhD is not only work but also a lot of fun. I thank Jakob Karolus and Thomas Kosch for the great and adventurous road trips through the US. I further thank the rest of the awesome hcilab group in Stuttgart who made every day a really enjoyable day: Alexandra Voit, Bastian Pfleging, Céline Coutrix, Lars Lischke, Mariam Hassib, Matthias Hoppe, Mauro Avila, Miriam Greis Nor- man Pohl, Pascal Knierim, Passant El.Agroudy, Paweł W. Woźniak, Rufat Rzayev, Romina Poguntke, Stefan Schneegaß, Thomas Kubitza, Tonja Ma- chulla, Valentin Schwind, and Yomna Abdelrahman. A special thanks goes to Anja Mebus, Eugenia Komnik and Murielle Naud-Barthelmeß for all their support and the administrative work that keeps the lab running smoothly. It was also a pleasure to work with awesome student assistants who supported me in conducting studies, recruiting participants, and transcribing interviews. This thesis would have not been possible without the support of Jamie Ullerich, Jonas Vogelsang, Max Weiß, and Henrike Weingärtner - thank you! Last but not least, I would like to thank my family for their unconditional support - my father Hung Son Le and mother Thi Bich Lien Luu for raising me to be the person I am today, for inspiring and making it possible for me to get 8 the education I wanted, and to making it possible for me to explore technology. I thank my sister Bich Ngoc Le for being there for me and supporting me in all possible ways. Further, I thank all my friends for their emotional support and patience that they showed me on my way to the PhD. Thank you! 9 10 Table of Contents 1 Introduction 15 1.1 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . 18 1.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.2.1 Limitations of the User-Centered Design Process . . . . . . . . . 19 1.2.2 Limitations of Common Deep Learning Processes . . . . . . . . 21 1.2.3 User-Centered Design Process for Deep Learning . . . . . . . . 22 1.3 Research Context . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 1.4 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2 Background and Related Work 29 2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.1.1 History and Development of Touch Interaction . . . . . . . . . . 30 2.1.2 Capacitive Touch Sensing . . . . . . . . . . . . . . . . . . . . 33 2.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.2.1 Hand Ergonomics for Mobile Touch Interaction . . . . . . . . . . 35 2.2.2 Novel Touch-Based Interaction Methods . . . . . . . . . . . . . 39 2.2.3 Interacting with Smartphones Beyond the Touchscreen . . . . . . 47 2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 11 3 Hand Ergonomics for Mobile Touch Interaction 53 3.1 Interaction Beyond the Touchscreen . . . . . . . . . . . . . . . . . . 54 3.1.1 Reachability of Input Controls . . . . . . . . . . . . . . . . . . 55 3.1.2 Unintended Inputs . . . . . . . . . . . . . . . . . . . . . . . . 56 3.2 Study I: Range and Comfortable Area of Fingers . . . . . . . . . . . . 57 3.2.1 Study Design . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.2.2 Apparatus . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.2.3 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.2.4 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.2.5 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . 61 3.2.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 3.2.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.3 Study II: Investigating Unintended Inputs . . . . . . . . . . . . . . . . 71 3.3.1 Study Design . . . . . . . . . . . . . . . . . . . . . . . . . . 71 3.3.2 Apparatus . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 3.3.3 Tasks and Procedure . . . . . . . . . . . . . . . . . . . . . . 74 3.3.4 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 3.3.5 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . 76 3.3.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 3.3.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 3.4 General Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 90 3.4.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 3.4.2 Design Implications . . . . . . . . . . . . . . . . . . . . . . . 91 4 Hand-and-Finger-Awareness on Mobile Touchscreens 93 4.1 Identifying the Source of Touch . . . . . . . . . . . . . . . . . . . . . 94 4.1.1 The Palm as an Additional Input Modality . . . . . . . . . . . . . 94 4.1.2 Investigating the Feasiblity of Finger Identification . . . . . . . . 99 4.2 Input Technique I: Palm as an Additional Input Modality (PalmTouch) . . 101 4.2.1 Data Collection Study . . . . . . . . . . . . . . . . . . . . . . 101 4.2.2 Model Development . . . . . . . . . . . . . . . . . . . . . . . 104 4.2.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 4.3 Input Technique II: Finger Identification . . . . . . . . . . . . . . . . . 117 4.3.1 Data Collection Study . . . . . . . . . . . . . . . . . . . . . . 117 12 Table of Contents 4.3.2 Model Development . . . . . . . . . . . . . . . . . . . . . . . 121 4.3.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 4.4 General Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 135 4.4.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 4.4.2 Lessons Learned . . . . . . . . . . . . . . . . . . . . . . . . 136 4.4.3 Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 5 Hand-and-Finger-Awareness on Full-Touch Mobile Devices 139 5.1 InfiniTouch: Finger-Aware Input on Full-Touch Smartphones . . . . . . 140 5.1.1 Full-Touch Smartphone Prototype . . . . . . . . . . . . . . . . 140 5.1.2 Ground Truth Data Collection . . . . . . . . . . . . . . . . . . 144 5.1.3 Finger Identification Model . . . . . . . . . . . . . . . . . . . . 147 5.1.4 Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 5.1.5 Mobile Implementation and Sample Applications . . . . . . . . . 152 5.1.6 Discussion and Limitations . . . . . . . . . . . . . . . . . . . . 156 5.2 Exploring Interaction Methods and Use Cases . . . . . . . . . . . . . 160 5.2.1 Interviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 5.2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 5.2.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 5.3 General Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 167 5.3.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 5.3.2 Lessons Learned . . . . . . . . . . . . . . . . . . . . . . . . 168 6 Improving Shortcuts for Text Editing 171 6.1 Text Editing on Mobile Devices . . . . . . . . . . . . . . . . . . . . . 172 6.1.1 Study Overview . . . . . . . . . . . . . . . . . . . . . . . . . 173 6.2 Study I: Shortcuts on Hardware Keyboards . . . . . . . . . . . . . . . 174 6.2.1 Apparatus . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 6.2.2 Procedure and Participants . . . . . . . . . . . . . . . . . . . 174 6.2.3 Log Analysis: Shortcuts on Hardware Keyboards . . . . . . . . . 175 6.2.4 Interviews: Hardware and Touchscreen Keyboards . . . . . . . . 176 6.2.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 6.3 Study II: Gesture Elicitation . . . . . . . . . . . . . . . . . . . . . . 181 6.3.1 Referents . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 Table of Contents 13 6.3.2 Apparatus and Procedure . . . . . . . . . . . . . . . . . . . . 182 6.3.3 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 6.3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 6.3.5 Gesture Set for Shortcuts in Text-Heavy Activities . . . . . . . . 188 6.3.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 6.4 Study III: Implementing the Gesture Set on a Full-Touch Smartphone . . 190 6.4.1 Apparatus . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 6.4.2 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 6.4.3 Procedure and Study Design . . . . . . . . . . . . . . . . . . . 191 6.4.4 Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 6.4.5 Mobile Implementation . . . . . . . . . . . . . . . . . . . . . . 195 6.5 Study IV: Evaluation of Shortcut Gestures . . . . . . . . . . . . . . . 196 6.5.1 Study Procedure and Design . . . . . . . . . . . . . . . . . . . 196 6.5.2 Apparatus . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 6.5.3 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 6.5.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 6.5.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 6.6 General Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 209 6.6.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 6.6.2 Lessons Learned . . . . . . . . . . . . . . . . . . . . . . . . 210 7 Conclusion and Future Work 211 7.1 Summary of Research Contributions . . . . . . . . . . . . . . . . . . 212 7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 Bibliography 217 List of Acronyms 256 14 Table of Contents 1 Introduction Over two billion people own a mobile device such as a smartphone or a ta- blet [285]. With their mobility and increasing processing capability, mobile devices replaced personal computers and laptops for the majority of everyday computing tasks. Millions of downloads on mobile app stores show that applicati- ons such as email clients, web browsers, calendars, and even editors for various media have become viable alternatives to their desktop counterparts. While mo- bile phones started with arrays of hardware buttons and a small display, recent smartphones incorporate a touchscreen that combines input and output in a single interface. This enables users to directly touch elements of the user interface (UI) and interact with them intuitively similar to physical objects. With touchscreens, smartphones can be designed as compact and self-contained mobile devices which leverage the whole front side for input as well as output. As a consequence, a wide range of applications previously designed for computers with keyboard and mouse are now also offering touch-based UIs. This transition increases the mobility of computing devices and enables users to use their device even while on the move. However, keyboards and mice as input devices are still superior to touch input since they provide more input capabilities. The difference is noticeable especially for complex tasks which require high precision (e.g. pla- 15 cing the caret in a text) and repetitive actions for which shortcuts are commonly used (e.g. copy and paste). Limited input capabilities slow down the interaction and lead to a lack of shortcuts which are fundamental for experienced users as described by Shneiderman’s golden rules for interface design [209]. In contrast to touchscreens, a computer mouse offers multiple buttons which enable users to activate different functions at the same cursor position. Similarly, hardware keyboards offer modifier keys (e.g., Ctrl, Alt, and Shift) which add additional dimensions to every other key. Touchscreens, however, translate a touch on the display into a two-dimensional coordinate which is mapped to the UI. While direct manipulation is powerful, the input’s expressiveness is limited to single coordinates despite the sheer amount of additional information that a smartphone could provide about a touch. With 3D Touch1, Apple showed that touch input can be purposefully extended by a pressure modality based on a proprietary technology involving an additional sensing layer. While this is the prime commercial example, the touch input vocabulary on commodity smartphones can also be extended without additional sensors beyond the touchscreen. In particular, the raw data of capacitive touchscreens was used for estimating the touch contact size [24], shape [182], and the orientation of a finger on the display [156, 198, 265]. These interaction techniques generally leverage properties beyond touch coordinates to provide additional input dimensions. However, mapping functions to specific finger postures increases the likelihood of unintended activations since a finger is now controlling multiple modalities simultaneously. One solution to lower the likelihood of unintended activations is to identify the touching finger or part of the hand to avoid interference with the main finger for interaction (e.g. the thumb). Previous work [38, 63, 82] identified parts of the finger (e.g. knuckle) or individual fingers to use the touch source as an additional input modality. However, the number of fingers that can touch the display during the prevalent single-handed grip [109, 110, 176, 178] is limited while additional wearable sensors [74, 75, 152] are required for an accurate finger identification. Differentiating between inputs of multiple fingers and hand parts while enabling them to interact with the device would profoundly extend the touch input capabilities. This would make smartphones more suitable for tasks 1https://developer.apple.com/ios/3d-touch/ 16 1 | Introduction https://developer.apple.com/ios/3d-touch/ which require complex inputs and help to solve common touch input limitations such as the fat-finger problem [16, 217], reachability issues [20, 133], and the lack of shortcuts. Without requiring immobile and inconvenient wearable sensors, or a second hand which is not always available, smartphones could become an even more viable and mobile replacement for personal computers and laptops. One step towards this vision was presented by previous work on Back-of- Device (BoD) interaction (e.g. [16, 39, 46, 133, 197, 250, 269]). With the input space extended to the rear, fingers that previously held the device are now able to perform input. However, previous work treated the touch-sensitive rear as an additional input space but not as an opportunity to enable individual fingers to perform specific input. Generally, only grip patterns were considered [33, 35, 36], while touch-sensitive areas were limited so that only the index finger can perform BoD input [10, 46, 133]. Consequently, the input space was extended but individual fingers and hand parts are still not usable as different input modalities. Touch inputs from individual hand parts and fingers need to be recognized and differentiated to use them as unique input modalities. In particular, the raw data of capacitive sensors (such as from recent touchscreens) contain enough signal which could be used to infer the source of a touch. With deep learning, robust and lightweight models could be developed which identify hand parts and fingers on nowadays’ smartphones. This concept profoundly extends the mobile touch input vocabulary and will be referred to as hand-and-finger-aware interaction. Before this concept can be used on commodity smartphones, a wide range of challenges need to be addressed. First, designing hand-and-finger-aware interactions with a focus on usability requires an understanding of the behavior and ergonomics of individual fingers while holding smartphones. There is no previous work which analyzes the reachable areas for each finger, nor the areas in which fingers typically move and reside. Second, the technical feasibility of identifying individual hand parts and fingers needs to be investigated. There is no system yet which identifies fingers and hand parts with accuracies usable for realistic everyday scenarios based on the raw data of commodity capacitive touch sensing technologies. Third, we also need to evaluate the concept of hand-and- finger-awareness with potential users to gather feedback. This enables to improve the concept to a level which is ready for the mass-market. Table of Contents 17 1.1 Research Questions In this thesis, we explore the concept of hand-and-finger-aware interaction for mobile devices. To inform the design and development of hand-and-finger-aware interaction methods, we present an exploration of six high-level research questions (RQs). The RQs are presented in Table 1.1. An important basis to design input on the whole device surface is the analysis of finger movements which do not require a grip change. Since a grip change leads to a loss of grip stability and could lead to dropping the device, we need to understand the range which individual fingers can cover and the areas in which they can comfortably move (RQ1). In addition to explicit movements, we further need to understand micro-movements which fingers perform while interacting with the device. An understanding is vital to minimize unintended inputs generated by these movements (RQ2). We use the raw data of capacitive sensors to identify hand parts and fingers based on deep learning. Before this approach can be leveraged for hand-and- finger-aware interaction, we need to investigate its feasibility and usability. We investigate the identification of hand parts and fingers using the raw data of a single capacitive touchscreen, i.e. on today’s commodity smartphones (RQ3). We further examine the feasibility of identifying individual fingers on fully touch sensitive smartphones (RQ4). This would enable the fingers on the rear to perform input, while the grip can be reconstructed for further interaction techniques. After understanding the ergonomics and behavior of all fingers while holding and interacting with smartphones, we evaluate hand-and-finger-aware interaction for common use cases. This helps to understand how this concept can be leveraged to further improve mobile interaction. Since touch input on recent mobile devices poses a number of limitations, we investigate how we could address them on a fully touch sensitive smartphone. This includes an elicitation of the limitations and potential solutions proposed by experienced interaction designers (RQ5). Finally, we focus text editing as a specific use case which the interaction designers identified as important but still inconvenient due to the limited input capabilities. In particular, we investigate the design and implementation of shortcuts on fully touch sensitive smartphones to improve text editing (RQ6). 18 1 | Introduction Research Question No. Chapter I. Hand Ergonomics for Mobile Touch Interaction How can we design Back-of-Device input controls to consider the reachability of fingers in a single-handed grip? (RQ1) Chapter 3 How can we design Back-of-Device input controls to minimize unintended inputs? (RQ2) Chapter 3 II. Identifying Fingers and Hand Parts How can we differentiate between individual fingers or hand parts on a capacitive touchscreen? (RQ3) Chapter 4 How can we estimate the position of individual fingers and identify them on a fully touch sensitive smartphone? (RQ4) Chapter 5 III. Improving Mobile Touch Interaction Which typical touch input limitations could be solved with a fully touch sensitive smartphone? (RQ5) Chapter 5 How can we design and use shortcuts on a fully touch sensitive smartphone to improve text editing? (RQ6) Chapter 6 Table 1.1: Summary of research questions addressed in this thesis. 1.2 Methodology Designing, developing, and evaluating novel interaction techniques is one of the major topics in human-computer interaction (HCI). The goal of an interaction technique is to provide users with a way to accomplish tasks based on a combina- tion of hardware and software elements. 1.2.1 Limitations of the User-Centered Design Process Previous work in HCI presented novel interaction techniques based on the user- centered design (UCD) process [102] as shown in Figure 1.1. The UCD process outlines four phases throughout an iterative design and development cycle to develop interactive systems with a focus on usability. The process consists of phases for understanding the context of use, specifying the user requirements, and developing a solution (i.e., implementing a working prototype) which is evaluated against the requirements. Each cycle represents an iteration towards a 1.2 | Methodology 19 solution which matches the users’ context and satisfies all of the relevant needs (e.g., increasing the usability to a level which satisfies relevant users). The UCD process focuses on the concept of the solution itself, assuming that specified user requirements can be unambiguously translated into a working prototype. Indeed, previous work commonly identified the need and requirements of an interaction technique and prototyped them using hand-crafted algorithms which range from simple value comparisons [152], thresholding [24, 74], and transfer functions [39] through computer vision techniques [93, 96] to kinematic models [23, 202]. With the advent of deep learning, complex relationships and patterns (e.g., in sensor data) can be learned from large amounts of data. Due to the increased availability of computing power and open-source frameworks (e.g., TensorFlow1, Keras2, PyTorch3), deep learning became a powerful tool for HCI researchers to develop solutions which are robust, lightweight enough to run on mobile devices, and do not even require domain knowledge (e.g., for a particular sensor and its noise). In addition, major parts of the prototypes can be reused even in market-ready versions of the system by reusing the data for model development or retraining the model for similar sensors. Prominent examples include object recognition in image data which even outperform humans [87, 88, 218]. Despite the powerful modeling capabilities, deep learning produces black box models which can hardly be understood by humans. Due to the lack of knowledge about a deep learning model’s internal workings, the model needs to be trained, tested, and validated with potential users within multiple iterations until it achieves the desired result. In contrast, the UCD process describes the design of a solution in a single step without involving potential users, an evaluation of its usability in a subsequent step, and a full refinement in a further iteration. Due to the huge effort required for developing a deep learning model (i.e. gathering a data set and multiple iterations of model development), the UCD process needs to be refined in order to incorporate iterative developments and tests of a model, as well as evaluating the model’s usability within the whole interactive system. In particular, the designing solution step needs to incorporate the modeling cycle of a deep learning process and connect it to the usability aspects of the UCD. 1TensorFlow: https://www.tensorflow.org/ 2Keras: https://keras.io/ 3PyTorch: https://pytorch.org/ 20 1 | Introduction https://www.tensorflow.org/ https://keras.io/ https://pytorch.org/ 1.2.2 Limitations of Common Deep Learning Processes A typical process for developing and evaluating deep learning models consists of four phases: gathering a representative data set (e.g., through a data collection study or using already existing ones), preparing the data (e.g., exploring, cle- aning, and normalizing), training and testing the model, as well as validating its generalizability on previously unseen data. Thereby, training and testing are often repeated in multiple iterations to find the most suitable hyperparameters that lead to the lowest model error on the test set based on trial-and-error and grid search [101] approaches. A final model validation with previously unseen data then assesses whether the chosen hyperparameters were overfitting to the test set. For this process, the deep learning community often use a training-test- validation split [42] (i.e., training and test set for model development, and the validation set for a one-time validation of the model) to develop and validate a model’s performance. However, software metrics alone (i.e., accuracies and error rates to describe how well the model generalizes to unseen data) do not describe the usability of a system which is the main focus of the UCD process. Instead of software metrics, factors such as the effect of inference errors on the usability (i.e. how well is the perceived usability for a given use case and how impactful are errors?), the model stability (i.e. how noisy are the estimations over time for none to small variations?), and the usefulness of the investigated system should be considered. As systems are used by a wide range of users and in different scenarios, the validation also needs to assess whether the model can generalize beyond the (specific and/or abstract) tasks used in a data collection study. Moreover, while previous work considered accuracies above 80% to be sufficient [113], sufficiency depends on the use case (i.e. whether the action’s consequence is recoverable and how much the consequence affects the user) which can only be evaluated in studies through user feedback. In summary, a typical process for deep learning describes the iterative nature of developing and evaluating black box models, but does not consider the usability of the model and thus of the final system. To apply deep learning techniques in HCI, we need to refine and combine the UCD process with typical deep learning processes to consider both the iterative development and evaluation of models, as well as their usability within the final system. 1.2 | Methodology 21 Solu�on meets requirements Understand and specify context of use Specify user require- ments Design solutions Evaluate against require- ments Design Improvement Iteration Figure 1.1: The user-centered design process as described in ISO 9241-210 [102]. 1.2.3 User-Centered Design Process for Deep Learning We present the user-centered design process for deep learning (UCDDL) which combines the UCD process with steps required for deep learning and is depicted in Figure 1.2. The UCDDL consists of five phases, whereas the first two phases are identical to the traditional UCD process and focus on understanding users as well as specifying requirements. The next three phases focus on developing a prototype based on deep learning and evaluating the system based on the factors described above. In the following, we describe the UCDDL which we apply throughout this thesis. 1. Understand and specify the context of use. This phase is about identifying users who will use the system, their tasks, and under which conditions they will use it (e.g., technical and ergonomic constraints). This step could consist of user studies to understand the context of use, or based on findings from previous work. 2. Specify user requirements. Based on the identified context, application scenarios and prototype requirements need to be specified. Based on these requirements, the solution will be developed and evaluated against. 3. Collect data based on user requirements. Training a deep learning model requires a representative and large enough data set as the ground truth. Gathering this data set in the context of a user study involves the design and development of an apparatus which runs mockup tasks to cover all expected interactions. 22 1 | Introduction Model Development Solu�on meets requirements Understand and specify context of use Specify user require- ments Collect data based on require- ments Model Develop- ment Design Improvement Iteration Model Improvement Iteration Design solution using deep learning Model Validation & Design Evaluation Figure 1.2: Adapted user-centered design process for deep learning in the context of interactive systems in HCI. Instructing potential users to perform certain tasks even enables the apparatus to automatically label each collected sample. This assumes that the experimenter is carefully observing whether participants actually perform the requested input correctly (e.g., when instructing participants to touch with a certain finger, it can be assumed that the captured data samples represent the instructed finger). The user study needs to be conducted with a representative set of potential users which cover all relevant factors to collect a sufficient amount of data for model training. The data set is the foundation of the developed system and needs to be refined (i.e., extended with more variance by adding users and tasks to cover the specified requirements) in case the final system does not generalize to new users and tasks which were specified in the requirements. In this case, another data collection study needs to be conducted whereas the resulting new data set needs to be combined with the already existing data set. In addition, the data collected in the evaluation phase (see Phase 5) could also be used to extend the existing data set. 4. Model development. Based on the data set, this phase applies deep learning to develop a model which is used by the system. Prior to the actual model training, the data set often needs to be cleaned (e.g., removing empty or potentially erro- neous samples for which the label correctness cannot be ensured) or augmented in case producing the desired amount of data is not feasible (e.g. adding altered samples such as by rotating the input or adding artificial sensor noise). Further, we first explore the data set with techniques such as visual inspection, descriptive 1.2 | Methodology 23 and inferential statistics (e.g. finding correlations), as well as applying basic ma- chine learning models such as linear regression and SVMs using simple feature extraction. This step provides an overview of the data set and helps choosing the optimal model and hyperparameters in later steps. In case only very few samples could be collected (e.g., due to a high effort for collecting or labeling), these basic models represent a viable solution. After data preprocessing and exploration, the data set needs to be split into a training and test set to avoid the same samples being “seen” during training and testing. Since the same user could generate highly similar data, the data set should further be split by participants (instead of by samples as commonly applied). Previous work commonly used a rate of 80%:20% for a training-test split, and a 70%:20%:10% for a training-test-validation split. While the deep learning community commonly use a training-test-validation split to detect overfitting to the test set due to hyperparameter tuning, the UCDDL process replaces the validation set with a user study in the next phase. This has two advantages: First, the full data set can be used to train the model and test it based on the test set. Second, the user study in the next phase can gather a validation set with new participants which are usually larger than 10% of the data set. More importantly, the model’s usability (and also the accuracy) can be evaluated in a realistic scenario based on feedback from potential users. This is not possible with a training-test-validation split which focuses only on the modeling aspect. The goal of the training process is to achieve the highest accuracy on the test set. The model is then deployed in the respective system (e.g. a mobile device in this thesis) for the evaluation in the next phase. 5. Model Validation and Design Evaluation. This phase evaluates the system as a whole with participants who did not participate in the data collection study (Phase 3). The evaluation focuses on three aspects: (1) a model validation to achieve the same results as the commonly used training-test-validation approach (combined with training and test of the previous phase), (2) evaluating the model usability (and optionally also the model error) in a realistic but controlled scenario 24 1 | Introduction to focus on individual aspects, and (3) evaluating the system within a common use case (as specified in Phase 2) to assess the usefulness of the system and the perceived usability of the model in a uncontrolled scenario. The model validation replaces the validation set based on similar tasks as used in the data collection study. In particular, data is collected with the same tasks which, at the same time, can also be used to introduce participants into the system. This prepares them for the usability evaluation within realistic scenarios which consists of a set of tasks that resemble a realistic use case. This set of tasks is designed to be controlled enough to enable a focus on individual aspects of the system (e.g., recognition accuracy and usability of certain classes of the model). For instance, a set of tasks could be designed in a pre-defined order so that model predictions can be compared with the order to determine the accuracy. To focus on the perceived usability, tasks could also be designed to expect only one type of input (i.e. one class). This enables to evaluate false positives for a certain class while collecting qualitative feedback from the participants about the used class. More complex outputs, such as regression, could employ additional sensors such as high-precision motion capture systems as ground truth. For the usability evaluation of the full system, participants use the prototype to solve tasks in a fully functional environment (e.g., an application designed for a certain use case, or even well-known applications). This step is less controlled and focuses on the system’s usability and usefulness. This results in qualitative feedback and quantitative measures such as the task completion time or success rate. In summary, the evaluation in the UCDDL covers the model validation as well as the usability aspect as described in the UCD process. 1.3 Research Context The research leading to this thesis was carried out over the course of three years (2016 – 2018) in the Socio-Cognitive Systems group at the Institute for Visualization and Interactive Systems. It was additionally part of a project funded in the Cluster of Excellence in Simulation Technology (SimTech) at the University of Stuttgart. The presented research was inspired by collaborations, publications, and discussions with many experts from within and outside the field of HCI. 1.3 | Research Context 25 Cluster of Excellence in Simulation Technology SimTech is an interdisciplinary research association with more than 200 scientists from virtually all faculties of the University of Stuttgart. A major part of the research was conducted in the project network “PN7 - Reflexion and Contextualisation”1. The research presented in this thesis underwent an examination in the form of a mid-term presentation accompanied by Prof. Dr. Syn Schmitt from the Institute of Sports and Exercise Science. Moreover, intermediate research results were presented at the annual SimTech Status Seminar. University of Stuttgart The research presented in this thesis was inspired by collaborations with colleagues from the University of Stuttgart. With the scientific expertise and technical knowledge from Patrick Bader, Thomas Kosch, and Sven Mayer we published six publications which are all in the scope of this thesis [123–125, 130, 132, 136]. Moreover, the collaborations resulted into further publications with relevant topics but beyond the scope of this thesis [117, 128, 133, 155, 156, 158–160] and tutorials on “Machine Learning for HCI” organized at national as well as international conferences [126, 134, 157]. Amongst others, online magazines and communities such as Arduino2, hackster.io3, and open- electronics.org4 reported on our prototypes presented in this work. The research was further inspired by discussions with a broad range of stu- dent peers and senior researchers at the doctoral consortium at the International Conference on Human-Computer Interaction with Mobile Devices and Services (MobileHCI 2016) [122] and the ACM CHI Conference on Human Factors in Computing Systems (CHI 2018) [121]. In addition, collaborations with Patrick Bader, Passant El.Agroudy, Tilman Dingler, Valentin Schwind, Alexandra Voit, and Dominik Weber resulted in publications beyond the scope of this thesis [11, 12, 48, 89, 131, 137, 247]. 1http://www.simtech.uni-stuttgart.de/en/research/networks/7/ 2http://blog.arduino.cc/2018/10/19/ infinitouch-interact-with-both-sides-of-your-smartphone/ 3http://blog.hackster.io/ dual-sided-smartphone-interaction-with-infinitouch-6362c4181fa2 4http://www.open-electronics.org/ infinitouch-is-the-first-fully-touch-sensitive-smartphone/ 26 1 | Introduction http://www.simtech.uni-stuttgart.de/en/research/networks/7/ http://blog.arduino.cc/2018/10/19/infinitouch-interact-with-both-sides-of-your-smartphone/ http://blog.arduino.cc/2018/10/19/infinitouch-interact-with-both-sides-of-your-smartphone/ http://blog.hackster.io/dual-sided-smartphone-interaction-with-infinitouch-6362c4181fa2 http://blog.hackster.io/dual-sided-smartphone-interaction-with-infinitouch-6362c4181fa2 http://www.open-electronics.org/infinitouch-is-the-first-fully-touch-sensitive-smartphone/ http://www.open-electronics.org/infinitouch-is-the-first-fully-touch-sensitive-smartphone/ External Collaborations Further research beyond the scope of this thesis was conducted with external collaborators. This includes Katrin Wolf from the Ham- burg University of Applied Sciences [129], Lewis Chuang from the Max Planck Institute for Biological Cybernetics [160], Sarah Clinch, Nigel Davies, and Corina Sas from the Lancaster University [131], as well as Agon Bexheti, Marc Lang- heinrich, and Evangelos Niforatos from the Università della Svizzera italiana [48]. 1.4 Thesis Outline This thesis consists of seven chapters, the bibliography, and the appendix. We present the results and evaluations of 12 empirical studies, an extensive review of related work, as well as a discussion and summary of the findings in the conclusion chapter. We structure the work as follows: Chapter 1 - Introduction motivates the research in this thesis and gives an overview about the research questions and the author’s contributions. We further present the user-centered design process for deep learning which we follow throughout this thesis. Chapter 2 - Background provides an overview of the history of touch inte- raction, an explanation of capacitive touch sensing, as well as an extensive review of touch-based interaction techniques on mobile devices and beyond. Chapter 3 - Hand Ergonomics for Mobile Touch Interaction describes the results of two studies investigating the behavior and ergonomic constraints of finger while holding a mobile device. Chapter 4 - Hand-and-Finger-Awareness on Mobile Touchscreens presents two models that use the raw data of capacitive touchscreens to recognize the source of touch, and their evaluations within realistic use cases. Chapter 5 - Hand-and-Finger-Awareness on Full-Touch Mobile Devices de- velops a smartphone prototype with touch sensing on the whole device surface and shows how fingers can be identified. Further, we discuss how full-touch smartphones can solve recent touch input limitations. 1.4 | Thesis Outline 27 Chapter 6 - Improving Shortcuts for Text Editing applies the findings from the previous chapters and presents four studies which cover all steps from understanding shortcut use on keyboards, a gesture elicitation study, a data collection study to train a gesture recognizer using deep learning, and finally an evaluation study. Chapter 7 - Conclusion and Future Work discusses the findings from the previous chapters, summarizes them, and provides directions for further research. 28 1 | Introduction 2 Background and Related Work While touchscreens enable intuitive interactions, keyboards and mice as input devices are still superior to touch input as they provide more input capabilities by enabling the use of multiple fingers. In this thesis, we explore novel touch-based interaction techniques which differentiate between individual fingers and hand parts to solve limitations of recent mobile touch interaction. To understand the technologies used in this thesis, this chapter provides an introduction to touch- based interaction as well as its history and technical background. We further review previous work in the domain of extending touch interaction and present recent challenges of mobile touch interaction which we address in this thesis. 2.1 Background Touchscreens are ubiquitous in our modern world. According to statista [285], over 2.5 billion people own a smartphone with a touchscreen as the main interface. People use smartphones for tasks which were previously exclusive to stationary computers and in a wide range of scenarios such as while sitting, walking, en- cumbered, or even during other tasks. The combination of input and output in 29 Figure 2.1: The first touchscreen as developed by E.A. Johnson. Image taken from [106]. a single interface enable intuitive interaction through direct touch. Moreover, touchscreens enable manufacturers to build compact and robust devices which use nearly the whole front surface for input and output. 2.1.1 History and Development of Touch Interaction The first finger-based touchscreen was invented in 1965 by E.A. Johnson [105] who described a workable mechanism for developing a touchscreen. As with most consumer devices nowadays, the presented prototype used capacitive sensing. The inventor envisioned the invention to be used for air-traffic-control, such as facilitating selections of call signs, flights, and executive actions [106, 184]. Figure 2.1 shows the display configuration for the touch interface. Five years later, Samuel Hurst and his research group at the University of Kentucky developed the first resistive touchscreen in 1970. In contrast to capacitive sensing methods as invented by E.A. Johnson, resistive touchscreens were more durable back then, not expensive to produce, and operation is not restricted to conductive objects such as human skin or conductive pens. Nowadays, resistive touch sensing can be found 30 2 | Background and Related Work mostly in public areas such as restaurants, factories, and hospitals. In 1972, the first widely deployed touchscreen based on infrared light was developed [55], and was deployed in schools throughout the united states. This technology employed fingers interrupting light beams that ran parallel to the display surface. In 1982, Nimish Mehta [162] developed the first multi-touch device which used a frosted-glass panel with a camera behind it so that it could detect action which are recognizable through black spots showing up on the screen. Gestures similar to today’s pinch-to-zoom or manipulation through dragging were first presented in a system by Krueger et al. [116]. Although the system was vision- based and thus is not suitable for touch interaction, many of the presented gestures could be readily ported to a two-dimensional space for touchscreens. One year later, the first commercial PC with a touchscreen (Hewlett Packard HP-1501) was released. The touchscreen is based on infrared sensing but was not well perceived at that time as graphical user interfaces were not widely used. In 1984, Bob Boie presented the first transparent multi-touch screen which used a transparent capacitive array of touch sensors on top of a CRT screen. Similarly, Lee et al. [138] developed a touch tablet in 1985 that can sense an arbitrary number of simultaneous touch inputs based on capacitive sensing. Using the compression of the overlaying insulator, the tablet is further capable of sensing the touch pressure. Recent iPhones incorporate this input modality under the name Force Touch. In 1993, the Simon Personal Communicator from IBM and BellSouth (see Figure 2.2) was released, which was the first mobile phone with a touchscreen. Its resistive touchscreen enabled features such as e-mail clients, a calendar, address book, a calculator, and a pen-based sketchpad. In the same year, Apple Computer released the MessagePad 100, a personal digital assistant (PDA) that can be controlled with a stylus but without a call functionality. The success of PDAs continued with the Palm Pilot by Palm Computing as the handwriting recognition worked better for the users. However, in contrast to smartphones nowadays, all these devices require the use of a stylus. In 1999, FingerWorks, Inc. released consumer products such as the Tou- chStream and the iGesture Pad that can be operated with finger inputs and ge- 1http: //www.hp.com/hpinfo/abouthp/histnfacts/museum/personalsystems/0031/ 2.1 | Background 31 http://www.hp.com/hpinfo/abouthp/histnfacts/museum/personalsystems/0031/ http://www.hp.com/hpinfo/abouthp/histnfacts/museum/personalsystems/0031/ Figure 2.2: Simon Personal Communicator, the first smartphone with a touchscreen by IBM and BellSouth. Image taken from arstechnica1. stures. The company was eventually acquired by Apple Inc. to contribute to the development of the iPhone’s touchscreen and the Apple’s Multi-Touch trackpad. Based on the work by Jun Rekimoto [194], Sony introduced a first flat input sur- face in 2002 that provides two-dimensional images of the changes in the electric field. This technology is known as mutual capacitive sensing and the electric field changes represent low-resolution shapes of conductive objects touching the sensor. In contrast to camera-based approaches, all elements are integrated into a flat touch panel which enables the integration into mobile devices. Touchscreens incorporated in smartphones nowadays are based on this technology. In the subsequent years, new touch-based technologies were introduced but these are not employed on smartphones due to space constraints. For example, Jeff Han introduced multitouch sensing through frustrated total internal reflection (FTIR) which is based on infrared (IR) LEDs and an IR camera below the touch surface to sense touch input. This enables building high-resolution touchscreens and is less expensive than other technologies. In 2008, the Microsoft Surface 1.0, a table-based touchscreen, was released that integrated a PC and five near-infrared cameras to sense fingers and objects placed on the display. Three years later, the second version of the Microsoft Surface (now called Microsoft PixelSense) 32 2 | Background and Related Work was released that is based on Samsung’s SUR40 technology. This technology represents a 40-inch interactive touch display in which pixels can also sense objects above it. This enables to build a less bulky tabletop without cameras below the display and generates a 960×540 px touch image that can be used for object tracking. 2.1.2 Capacitive Touch Sensing Since the invention of the first touchscreen, a wide range of further touch sensing technologies have been presented. While many of these approaches provide a higher touch sensing resolution and expressiveness compared to the earlier invented capacitive and resistive touchscreens, they are less suitable for mobile devices due to their immobile setup. Amongst others, these technologies include frustrated total internal reflection [77], surface acoustic waves [142], camera- based touch sensing (e.g. RGB [225], depth [252]), infrared touch sensing [2], and inductive touch sensing [43]. Due to their compact size, robustness, and responsiveness, capacitive tou- chscreens are widely used in mobile devices nowadays. In particular, mobile devices use projected capacitive touchscreens which sense touches with a higher resolution than surface capacitance which is often used on larger surfaces with four electrodes at each corner. Figure 2.3 sketches the functional principle of a mutual capacitive touchscreen. Mutual capacitance is one of the two types of the projected capacitance principle and is commonly used in recent mobile devices [15]. The touch sensor itself consists of three layers; an electrode pattern layer in the middle which is responsible for the actual touch sensing and two protective layers. The touch sensor with all of its layers is transparent and placed on top of the display unit such as a liquid crystal display (LCD). The electrode pattern layer is connected to a touch controller and consists of conductive wires made out of indium tin oxide (ITO) which is transparent and sketched on the bottom left of Figure 2.3. The controller measures the change of coupling capacitance between two orthogonal electrodes, i.e. intersections of row and column pairs [50]. These measurements result in a low-resolution finger imprint which is shown on the bottom right of Figure 2.3 and referred to as a capacitive image [73, 99, 136, 156]. 2.1 | Background 33 Protective CoverElectrode Pattern Layer Glass Substrate Electrical Field Electrode Layer X Electrode Layer Y 0 0 2 2 4 4 4 2 1 1 0 1 4 4 4 4 2 2 0 0 1 5 18 11 4 7 2 0 0 2 44 141 83 11 2 0 0 2 9 99 219 136 19 2 0 0 1 5 29 55 29 9 4 2 0 2 2 7 9 9 5 2 1 0 0 1 4 2 2 2 1 0 2 1 2 5 1 1 1 1 1 Capacitive Image (representing the touch of a finger) Figure 2.3: Components of a mutual capacitive touchscreen and the resulting ca- pacitive image. Figure adapted and extended based on http://www.eizo.com/ library/basics/basic_understanding_of_touch_panel/. Capacitive touchscreens of commodity smartphones comprise around 400 to 600 electrodes (e.g., 15×27 electrodes with each being 4.1×4.1mm on an LG Nexus 5). The touch controller translates the measurements into a 2D coordinate which is then provided to the operating system (indicated as a red dot in the Figure). While touch interaction on recent mobile devices is based solely on the 2D coordinate of a touch (i.e. the red dot), the remaining information about a touch 34 2 | Background and Related Work http://www.eizo.com/library/basics/basic_understanding_of_touch_panel/ http://www.eizo.com/library/basics/basic_understanding_of_touch_panel/ is omitted. In this thesis, we present a number of approaches which uses the capacitive images of commodity mutual capacitive touchscreens in mobile devices to infer the source of a touch such as different fingers and hand parts. 2.2 Related Work Related work presented a wide range of novel interaction techniques to extend the touch input vocabulary on mobile devices. Following the structure of this thesis, we first describe the ergonomics and physical limitations of the hand for interaction with mobile devices. Secondly, we describe interaction methods that improve and extend the interaction with a touchscreen (on the front side) on mobile devices. Lastly, we go one step further and review related work that presents novel interaction methods based on touch input beyond the front touchscreen (e.g., the back and edges of a device). 2.2.1 Hand Ergonomics for Mobile Touch Interaction In contrast to stationary input devices such as a hardware keyboard and mouse, users usually hold and interact with mobile devices simultaneously. This poses a wide range of challenges. When using a smartphone in the prevalent single- handed grip [54, 109, 110, 176], the same hand is used for holding and interacting with the device. This limits the fingers’ range and generates unintended inputs due to the continuous contact with the device. In the following, we review previous work on ergonomics of the hand when holding a smartphone and supportive finger movements which users perform during interaction. Placement, Movement, and Range of Fingers To inform the design of novel interaction methods on mobile devices, an under- standing of finger placement, movement and their ranges is vital. A wide range of 2.2 | Related Work 35 heuristics have been proposed by designers over the years1,2,3,4,5. Previous work further investigated the range of the thumb to inform the design of mobile user interfaces [20]. Since BoD and edge input became more relevant in recent years, all other fingers need to be investigated to inform the design of fully hand-and- finger-aware interaction methods. While previous work showed where fingers are typically placed when holding a smartphone [276], there is no work that studied the areas reachable by all fingers on mobile devices. This thesis contributes to this research area by studying finger placements, ranges and reachable areas of all fingers on mobile devices. An important basis to inform the placement of on-screen interaction elements and on-device input controls is the analysis of areas on the device that can be reached by the fingers. Bergstrom-Lehtovirta and Oulasvirta [20] modeled the thumb’s range on smartphones to inform the placement of user interface elements for one-handed interaction. To predict the thumb’s range, the model mainly involves the user’s hand size and the position of the index finger which is assumed to be straight (adducted). For the predicted range of the thumb, they introduced the term functional area which is adapted from earlier work in kinesiology and biomechanics. In these fields, possible postures and movements of the hand are called functional space [118]. Thumb behavior was further investigated by Trudeau et al. [231] who modeled the motor performance in different flexion states. Park et al. [189] described the impact of touch key sizes on the thumb’s touch accuracy while Xiong et al. [268] found that the thumb develops fatigue rapidly when tapping on smaller targets. Besides the thumb, previous work investigated the index finger during smartp- hone interaction. Yoo et al. [276] conducted a qualitative study to determine the comfortable zone of the index finger on the back of the device. This was done without moving the finger and by asking users during the study. From a biomecha- nical perspective, Lee et al. [139] investigated the practicality of different strokes 1https://www.uxmatters.com/mt/archives/2013/02/ how-do-users-really-hold-mobile-devices.php 2http://blog.usabilla.com/designing-thumbs-thumb-zone/ 3http://scotthurff.com/posts/facebook-paper-gestures 4https://www.smashingmagazine.com/2016/09/ the-thumb-zone-designing-for-mobile-users/ 5https://medium.com/@konsav/-55aba8ed3859 36 2 | Background and Related Work https://www.uxmatters.com/mt/archives/2013/02/how-do-users-really-hold-mobile-devices.php https://www.uxmatters.com/mt/archives/2013/02/how-do-users-really-hold-mobile-devices.php http://blog.usabilla.com/designing-thumbs-thumb-zone/ http://scotthurff.com/posts/facebook-paper-gestures https://www.smashingmagazine.com/2016/09/the-thumb-zone-designing-for-mobile-users/ https://www.smashingmagazine.com/2016/09/the-thumb-zone-designing-for-mobile-users/ https://medium.com/@konsav/-55aba8ed3859 for BoD interaction. Similarly, prior work found that using the index finger for target selection on the BoD leads to a lower error rate than using the thumb for direct touch [143, 256]. Wobbrock et al. [256] showed that both the thumb on the front and index finger on the BoD perform similarly well in a Fitts’ law task. Wolf et al. [260] found that BoD gestures are performed significantly different than front gestures. Corsten et al. [40, 41] used BoD landmarks and showed that the rear position of the index finger could be accurately transferred to the thumb by pinching both fingers. Since different grips can be used as an input modality [254], a wide range of prior work sought an understanding of how users hold the phone while using it. Eardly et al. [53, 54] explored hand grip changes during smartphone interaction to propose use cases for adaptive user interfaces. They showed that the device size and target distance affects how much users tilt and rotate the device to reach targets on the touchscreen. Moreover, they investigated the effect of body posture (e.g., while standing, sitting, and lying down) on the hand grip, and showed that most grip movements were done while lying down followed by sitting and finally standing [52]. Previous work in biomechanics looked into different properties of the hand. Napier et al. [175] investigated two movement patterns for grasping objects which they call precision grip and power grip. People holding objects with the power grip use their partly flexed fingers and the palm to apply pressure on an object. Sancho-Bru et al. [205] developed a 3D biomechanical hand model for power grips and used it to simulate grasps on a cylinder. However, as smartphones are not necessarily held in a power grip, this model cannot be applied to smartphone interaction. Kuo et al. [118] investigated the functional workspace of the thumb by tracking unconstrained motion. This is the space on the hand which is reachable by the thumb. Brook et al. [26] introduced a biomechanical model of index finger dynamics which enables the simulation of pinch and rotation movements. As holding a smartphone and interacting with the touchscreen introduces additional constraints to all fingers, these results cannot be applied to model the hand grip and ergonomics. 2.2 | Related Work 37 Supportive Finger Movements Although users intend to move only the thumb to perform single-handed input on a front touchscreen, they unconsciously perform a wide range of further “supportive” movements. These movements maintain the balance and grip on the device, increase the reachability of the thumb on the display (e.g., through tilting [34] and grip shifts [53, 54]), or are unavoidable due to the limited movement independence of fingers (e.g., moving one finger also moves other fingers [76]). An important basis to design BoD input controls that take unintended input into account is the analysis of supportive micro-movements during common smartphone tasks. Tilting the device is one type of supportive micro-movements which is used to increase the thumb’s reachability on the display. Previous work found that users tilt the device towards their thumb to reach farther distanced targets (e.g., at the top left corner) and away from their thumb to reach targets at the bottom right corner [34, 54]. Eardley et al. [52–54] referred to all movements which increase the reachability as “grip shifts” and explored them for different device sizes and tasks. Based on video recordings with manually identified key points and accelerometer values, they quantified the number of grip shifts during common smartphone tasks. They found that more grip shifts occurred with increasing device sizes while the amount of tilt and rotation varied with grip types and phone sizes. Moreover, they showed that the body posture (e.g., sitting and standing) affects the device movements, suggesting that device sizes and body postures need to be considered for exploring supportive micro-movements. While these findings explain the device movements, no previous work investigated the actual finger movements which could generate unintended input on the device surface. The limited independence of finger movements causes another type of sup- portive micro-movements. Previous work in biomechanics found that even when asked to move just one finger, humans usually also produce motion in other fingers [76]. The limited independence of the fingers is due to biomechanical interconnections such as connected soft tissues [242] and motor units [207]. Mo- reover, Trudeau et al. [231] found that the thumb’s motor performance varies by the direction and device size during single-handed smartphone use while the motor performance is generally greater for two-handed grips [230]. While 38 2 | Background and Related Work Sancho-Bru [205] presented a biomechanical model of the hand for the power grip [175], an application thereof is not possible for an investigation of supportive micro-movements as smartphones are not used solely in a power grip. One chapter of this thesis contributes to the understanding of supportive micro-movements by studying how fingers on the rear move while interacting with the front side. 2.2.2 Novel Touch-Based Interaction Methods Recent touchscreens are designed to register two-dimensional locations of touches. These locations are provided to the application layer of the operating system to enable interaction with the user interface. Besides the two-dimensional location of touches, a wide range of touch properties are available that can be used to increase the input vocabulary of touch interaction. Well-known examples from recent operating systems are the long-press that leverages the dwell time and gestures that are based on subsequent touch locations. While these additions are beneficial, they require additional execution time. Moreover, the touch input vocabulary is still limited when compared to other input devices such as hardware keyboards or computer mouses. In the following, we describe related work that improves touch input using data from touchscreens and their mobile device. Extending Touch Interaction on Mobile Touchscreens Previous work presented a wide range of approaches to extend the touch input vocabulary on mobile touch-based devices. In the following, we describe two common approaches that do not require additional sensors beyond a touchscreen. This includes approaches that are (1) based solely on two-dimensional touch locations available on all touchscreen technologies, and (2) based on the raw data of capacitive touchscreens representing low-resolution fingerprints of touches. Using the Two-Dimensional Location of Touches Approaches to extend the touch input vocabulary based on only the two-dimensional location of touch inputs can readily be deployed on any touch-based mobile device. Since all tou- chscreens already provide the two-dimensional location of touches, no additional information and sensors are required. 2.2 | Related Work 39 Single taps are mostly used for selection-based interaction such as selecting an action assigned to a button. Gestures play an important role in making user interfaces more intuitive (e.g., moving objects by dragging them) and in provi- ding shortcuts for a faster access to frequently used functions (e.g., launching applications [190], searching [141]). A gesture can be performed by moving the finger while in contact with the touchscreen. This generates a trajectory of two-dimensional locations of touches that are then interpreted as gestures by the system. Previous work in HCI invested a sheer amount of effort to improve gesture-based interfaces, such as through methodologies for gesture design [237, 238, 255, 256], simple gesture recognizers for fast prototyping purposes [5, 235, 257], improving gesture memorability [173, 277], and through design guidelines for gesture designs [4, 278]. However, gestures have the disadvantage that they require additional execution time as well as enough screen space for the execution. Moreover, a comprehensive set of gestures would lead to conflicts (e.g., uninten- ded activations) and the accuracy of gesture recognizers would decrease due to ambiguity errors. Previous work proposed a wide range of interaction methods to enrich touch interaction beyond gesture shapes and types. Amongst others, a gesture starting from the device’s bezel can be distinguished from a gesture starting on the touchscreen itself. This differentiation was used in previous work to provide shortcuts to the clipboard [200] and to improve one-handed interaction by offering reachability features [112]. Moreover, researchers implemented simple heuristics to use the finger orien- tation as an input dimension. Roudaut et al. [202] presented MicroRolls, a micro-gesture that extends the touch input vocabulary by rolling (i.e., changing pitch and roll angle of the finger) the finger on the touchscreen. Since touchscreens translate touch contact areas to two-dimensional locations based on the area’s centroid [23, 98, 202], a trajectory of two-dimensional locations is generated through the changing contact area induced by finger rolling. MicroRolls uses this trajectory to recognize rolling movements with accuracies of over 95%. Ho- wever, this interaction techniques cannot be used during a drag action since the 40 2 | Background and Related Work segmentation of the gesture requires down and up events. Thus, Bonnet et al. [23] presented ThumbRock which improves MicroRolls by additionally using the size of the contact area as reported by Apple iOS. Using the Raw Data of Capacitive Touchscreens Nowadays, the majority of touchscreens incorporated in mobile devices are based on mutual capacitive sensing. Taking the measurements of all electrodes of the touchscreen, a two- dimensional image (referred to as capacitive image [73, 99, 136, 156]) can be retrieved as shown in Section 2.1.2. Previous work predominantly used an LG Nexus 5 since its touch controller (Synaptics ClearPad 3350) provides a debugging bridge to access the 8-bit capacitive images with a resolution of 27×15 px at 6.24ppi. While capacitive images can be used to recognize body parts for authentication purposes [73, 99], previous work also used the resulting area for interaction methods. Amongst others, Oakley et al. [182] used the area of touches on smartwatches to provide shortcuts to pre-defined functions. Similarly, Boring et al. [24] used the size of the contact area to enable one-handed zooming and panning. To extend the touch input performed with fingers, researchers developed machine learning models that infer additional properties based on the capacitive images. Amongst others, machine learning models can be used to estimate the pitch1 and yaw2 angle of a finger touching the display [156, 265]. In contrast to the approach on the tabletop [244], machine learning was necessary as no high-resolution contact area is available. Moreover, Gil et al. [63] used basic machine learning techniques to identify fingers touching the display. However, they showed that a usable accuracy can only be achieved with exaggerated poses on smartwatches so that each finger touched with a distinct angle. Recent Huawei devices incorporate KnuckleSense, an additional input modality that differentiates between touches made by fingers and knuckles. This technology is based on FingerSense, a proprietary technology by Qeexo3 of which no technical details are publicly available. 1Pitch angle: Angle between the finger and the horizontal touch surface. 2Yaw angle: Angle between the finger and the vertical axis. 3http://qeexo.com/fingersense/ 2.2 | Related Work 41 http://qeexo.com/fingersense/ Extending Touch Interaction through Additional Sensors Previous work and smartphone manufacturers used additional built-in sensors to augment touch input. Amongst others, this includes sensors to measure the applied force, microphone recordings, inertial measurement units (IMUs), and pre-touch sensing. Moreover, we give an overview of external sensors that were used in previous work to extend touch input. Force and Pressure Pressure input offers an additional input dimension for mobile touch interaction. Since interaction can be performed without moving the finger, this input dimension benefits user interfaces on small displays and situations in which finger movements are not desirable. Using the force applied on the touchscreen of a mobile device was first used by Miyaki and Rekimoto [167] to extend the touch input vocabulary. Based on force sensitive resistors between the device and a back cover, they measured the changing pressure levels to prototype one-handed zooming on mobile devices. Stewart et al. [223] investigated the characteristics of pressure input on mobile devices and found that a linear mapping of force to value worked the best for users. Researchers further used the shear force, the force tangential to the display’s surface, to extend pressure input. Amongst others, Harrison and Hudson [80] developed a touchscreen prototype that uses the shear force for interaction while Heo and Lee [90] augment touch gestures by sensing normal and tangential forces on a touchscreen. Beyond the touchscreen, force can also be used for twisting the device as an input technique [68, 69, 119]. With the iPhone 6s, Apple introduced the pressure input dimension under the name Force Touch. Based on force sensors below the touchscreen or a series of electrodes on the screen curvature (Apple Watch), they used the additional input dimension to enable users to perform secondary actions such as opening a context menu or peeking into files. To estimate the force of a touch without additional sensors, Heo and Lee [91] used the built-in accelerometer and position data of the touchscreen. Acoustics The sound resulting from an object’s impact on the touchscreen can be used to differentiate between the source of input. By attaching a medical stethoscope to the back of a smartphone, Harrison et al. [82] showed the feasibility 42 2 | Background and Related Work of differentiating between different parts of the finger (e.g., pad, tip, nail, or knuckle) as well as objects (e.g., stylus). Lopes et al. [145] used a similar approach and augmented touch interaction based on a contact microphone to sense vibrations. With this, they showed that different hand placements on the touch surface (e.g., tap with a finger tip, knock, slap with the flat hand, and a punch) can be reliably recognized. Similarly, Paradiso et al. [188] used four contact piezoelectric pickups at the corners of a window to differentiate between taps and knocks. In general, approaches based on acoustic sensing have shown to be reliable to identify the source of touch. However, since microphones are required to continuously capture the acoustics, these approaches are prone to errors in noisy situations. Thus, they are not suitable for interaction on mobile devices such as smartphones and tablets. Physical Device Movement A wide range of previous work combined touch input with the built-in accelerometer of mobile devices. Hinckley et al. [92] introduced the terminology of touch-enhanced motion techniques which combine information of a touch and explicit device movements sensed by the IMU. For example, a touch and a subsequent tilt sensed by the accelerometer can be used to implement one-handed zooming while holding an item on the touchscreen followed by shaking the device can be used to offer a shortcut to delete files. Similar gestures were explored especially for interaction with wall displays using a mobile phone. Hassan et al. [86] introduced the Chucking gesture in which users tap and hold an icon on the touchscreen, followed by a toss measured by the accelerometer to transfer the file to the wall display. To transfer items between public displays using a mobile phone, Boring et al. [25] proposed a similar gesture in which users hold an object on the touchscreen and move the mobile devices between displays. Researchers also used the built-in accelerometer to enhance text entry on mobile devices. This includes the use of the device orientation to resolve ambiguity on a T9 keyboard [249] and the improvement of one-handed gestural text input on large mobile devices [273]. In contrast, motion-enhanced touch techniques combine touch input and the implicit changes of the accelerometer values to infer touch properties. For 2.2 | Related Work 43 example, a soft tap can be differentiated from a hard tap through the impact of the touch. Going one step further, Seipp and Devlin [213] used touch position and accelerometer values to develop a classifier that determines whether users are using the device in a one-handed grip with the thumb or in a two-handed grip with the index finger. With this, they achieved an accuracy of 82.6%. Similarly, Goel et al. [66] used the touch input and device rotation to infer the hand posture (i.e., left/right thumb, index finger) with an accuracy of 87%. By attaching a wearable IMU to the user’s wrist, Wilkinson et al. [251] inferred the roll and pitch angle of the finger, and the force of touches described by the acceleration data. Proximity Touch Sensing Marquardt et al. [150] proposed the continuous in- teraction space, which was among the first models that describe the continuity between hover and on-screen touches. They proposed a number of use cases that enables users to combine touch and hover gestures anywhere in the space and naturally move between them. Amongst others, this includes raycasting gestures to extend reachability, receiving hints through hovering over UI elements [37], and avoiding occlusion by continuing direct touch actions in the space above. Spindler et al. [222] further proposed to divide the interaction above the table- top into multiple layers while Grossman [71] explored hover interaction for 3D interaction. Hover information can also be used to predict future touch locations. Xia et al. [264] developed a prediction model to reduce the touch latency of up to 128ms. To avoid the fat-finger problem, Yang et al. [270] used a touch prediction to expand the target as the finger approaches. Similarly, Hinckley et al. [93] explored hover interaction on mobile devices and proposed to blend in or hide UI components depending on whether a finger is approaching or withdrawn (e.g., play button in a video player appears when the finger is approaching). Since a finger can also be sensed above the display, Rogers et al. [198] developed a model for estimating the finger orientation based on sensing the whole finger on and above a touchscreen. Previous work presented different approaches to enable proximity touch sensing. The SmartSkin prototype presented by Rekimoto [194] calculates the distance between hand and surface by using capacitive sensing and a mesh-shaped 44 2 | Background and Related Work antenna. Annett et al. [3] presented Medusa which is a multi-touch tabletop with 138 proximity sensors to detect users around and above the touchscreen. On the commercial side, devices such as the Samsung Galaxy S4 and the Sony Xperia Sola combine mutual capacitance (for multi-touch sensing on the touchscreen), and self-capacitance (generates a stronger signal but only senses a single finger) to enable hover interaction1. Fiducial Markers and Capacitive Coupling A large body of work coupled exter- nal sensors and devices with touchscreens to extend the touch input vocabulary. The focus lies especially on identifying the object touching the display, such as different fingers, users, and items. A common approach to identify objects on the touchscreen is to use fiducial markers. These markers assign a unique ID to an object through a uniquely patterned tag in the form of stickers [108, 195], NFC tags [240, 241], RFID tags [183], unique shapes [85], and through rigid bodies of conductive areas attached to objects (“capacitance tags”) [194]. While these approaches are only suitable for objects due to the attachment of tags, previous work investigated the use of capacitive coupling (i.e., placing an electrode between object and the ground to change the electric field measured by the touchscreen) to reliably identify users [243] and authenticate them with each touch [100]. Similarly, DiamondTouch [47] identifies users based on an electric connection to the chair they are sitting on while Harrison et al. [81] used Swept Frequency Capacitive Sensing (SFCS) which measures the impedance of a user to the environment across a range of AC frequencies. Using the same technology, Sato et al. [206] turned conductive objects to touch-sensitive surfaces that can differentiate between different grips (e.g., touch, pinch, and grasp on a door knob). Active Sensors To identify different fingers on the display, previous work used a wide range of different sensors. Approaches that achieved high accuracies include the use of IR sensors [74, 75] and vibration sensors [152] mounted on different fingers. Further approaches include electromyography [19], gloves [149] 1https://www.theverge.com/2012/3/14/2871193/ sony-xperia-sola-floating-touch-hover-event-screen-technology 2.2 | Related Work 45 https://www.theverge.com/2012/3/14/2871193/sony-xperia-sola-floating-touch-hover-event-screen-technology https://www.theverge.com/2012/3/14/2871193/sony-xperia-sola-floating-touch-hover-event-screen-technology and RFID tags attached to the fingernail [239]. To avoid instrumenting users with sensors, previous work also used a combination of cameras attached to a mobile device and computer vision to identify fingers [245, 284]. For example, Zheng et al. [284] used the built-in webcam of laptops to identify fingers and hands on the keyboard. Using depth cameras such as the Microsoft Kinect provides additional depth information for finger identification. Amongst others, these were used by Murugappan [172] and Wilson [252] to implement touch sensors. The Leap Motion1 is a sensor device that uses proprietary algorithms to provide a hand model with an average accuracy of 0.7mm [248]. Colley and Häkkilä [38] used a Leap Motion next to a smartphone to evaluate finger-aware interaction. While these are all promising approaches, they are not yet integrated into mass-market devices since wearable sensors are limiting the mobility while sensors attached to the device (e.g., cameras) are increasing the device size. Extending Touch Interaction on Tabletops Previous work presented a wide range of novel interaction methods based on images of touches provided by touchscreens. Researchers predominantly focused on tabletops that provide high-resolution images of touches [8, 56, 62] through technologies such as infrared cameras below the touch surface or frustrated total internal reflection [77]. The Microsoft PixelSense is a common example and provide high-resolution images with a resolution of 960×540 px (24ppi). This enabled a wide range of novel interaction methods including the development of widgets triggered by hand contact postures [154], using the forearm to access menus [114], using the contact shape to extend touch input [18, 30], and gestures imitating the use of common physical tools (e.g., whiteboard eraser, eraser, camera, magnifying glass) to leverage familiarity [83]. The latter was commercialized by Qeexo as TouchTools2. Based on a rear-projected multi-touch table with a large fiber optic plate as the screen, Holz and Baudisch developed a touchscreen that senses fingerprints for authentication [96]. This is possible due to a diffuse light transmission while the touchscreen has a specular light reflection. Other approaches for user 1https://www.leapmotion.com/ 2http://qeexo.com/touchtools/ 46 2 | Background and Related Work http://https://www.leapmotion.com/ http://qeexo.com/touchtools/ PPPPType Position Front side Back side Top side Bottom side Left side Right side Touch Fingerprint scanner Secondary screen j Hardware buttons (e.g., back, home) Fingerprint scanner BoD Touch a, j [16, 46] Heart rate sensor f [151] BoD touchscreen m - - - Edged display b Buttons Hardware keyboard b Home/Menu button c Back/Recent button c BoD Button d Volume button l Power button e - Volume buttons Bixby button f Power button Volume buttons Shutter button g Slide - - - - Silent switch e - Pressure Force Touch [272] - - - Side pressure h [59, 221] Scrolling Trackball i LensGesture [266] - - - Scrolling wheel b Tapping - BoD taps [197] Edge taps [161] Misc Front camera Front speaker Light sensor Distance sensor Notification LED Back camera Back speaker Torchlight E-ink display k Microphone Audio port USB port g Microphone Speaker USB port Audio port - - a OPPO N1, b RIM BlackBerry 8707h, c HTC Tattoo, d LG G-Series, e iPhone 5, f Samsung Galaxy S8, g Nokia Lumia 840, h HTC U11, i Nexus One, j LG X, k YotaPhone 2, l Asus Zenfone, m Meizu Pro 7. Table 2.1: Types of interaction controls beyond the touchscreen that are presented in prior work and in recent or past smartphones. While some are not intended for interaction initially (e.g., camera), these sensors could still be used for interaction in the future, e.g. [266]. identification on tabletops are based on users’ back of the hand captured by a top-mounted camera [193], by their hand geometry [22, 208], their shoes based on a camera below the table [196], through personal devices [1], tagged gloves [148], finger orientations [45, 280], IR light pulses [163, 199], and through capacitive coupling [47, 243]. 2.2.3 Interacting with Smartphones Beyond the Touchscreen Since users are holding the smartphone during the interaction, the touchscreen on the front is not the only surface that could be used for input. Previous work and smartphone manufacturers presented a wide range of input mechanisms beyond the touchscreen on the device surface. While the power and volume buttons are integral parts of a smartphone nowadays, we describe further input mechanisms in the following. 2.2 | Related Work 47 On-Device Input Controls Previous work and manufacturers presented a broad range of input controls for smartphones of which we provide an overview in Table 2.1. We categorized them by their location on the device, and by the expected type of input. Current smartphones such as the iPhone 7 and Samsung Galaxy S8 incorpo- rate fingerprint sensors below the touchscreen or on the back of the device. These are mainly used for authentication purposes but can also recognize directional swipes that act as shortcuts for functions such as switching or launching applica- tions. Previous work envisioned different functions that can be triggered using a fingerprint sensor [185]. Due to a small number of devices that support any form of interaction on the rear, researchers presented different ways to use built-in sensors for enabling BoD interaction, including the accelerometer [140, 197] to recognize taps and the back camera to enable swipe gestures [266]. Previous work also presented a number of smartphone prototypes that enable touch input on the whole device surface, including the front, back and the edges [127, 132, 168]. This enables a wide range of use cases which includes touch-based authenti- cation on the rear side to prevent shoulder surfing [46], improving the reachability during one-handed smartphone interaction [133], 3D object manipulation [10, 214], performing user-defined gesture input [215] and addressing the fat-finger problem [16]. Recently, Corsten et al. [39] extended BoD touch input with a pressure modality by attaching two iPhones back-to-back. Before HTC recently introduced Edge Sense, pressure as an input modality on the sides of the device have been studied in previous work [59, 95, 221, 253] to activate pre-defined functions. Legacy devices such as the Nexus One and HTC Desire S provide mechanical or optical trackballs below the display for selecting items as this is difficult on small displays due to the fat-finger problem [16]. As screens were getting larger, trackballs became redundant and were removed. Similarly, legacy BlackBerry devices incorporated a scrolling wheel on the right side to enable scrolling. For years, smartphones featured a number of button controls. Amongst others, this includes a power button, the volume buttons, as well as hardware buttons such as the back, home and recent buttons on Android devices. As a shortcut to change the silent state, recent devices such as the iPhone 7 and OnePlus 5 48 2 | Background and Related Work feature a hardware switch to immediately mute or unmute the device. Moreover, the Samsung Galaxy S8 introduced an additional button on the left side of the device as a shortcut to the device assistant while other devices incorporate a dedicated camera button. Since a large number of hardware buttons clutter the device, previous work used the built-in accelerometer to detect taps on the side of the device [161]. Back-of-Device Prototyping Approaches The simplest and most common approach for BoD interaction is to attach two smartphones back-to-back [39, 46, 214, 261, 262]. However, this approach increases the device thickness which negatively affects the hand grip and inte- raction [133, 226]. This is detrimental for studies that observe the hand behavior during BoD interaction, and could lead to muscle strain. To avoid altering the de- vice’s form factor, researchers built custom devices that resemble smartphones [10, 33, 228]. However, these approaches mostly lack the support of an established operating system so that integrating novel interactions into common applications becomes tough. As a middle ground, researchers use small additional sensors that barely change the device’s form factor. These include 3D-printed back cover replacements to attach a resistive touch panel [133], and custom flexible PCBs with 24 [168, 169] and 64 [33] square electrodes. However, neither the panel size nor the resolution is sufficient to enable precise finger-aware interactions such as gestures and absolute input on par with state-of-the-art touchscreens. Beyond capacitive sensing, researchers proposed the use of inaudible sound signals [174, 203, 246], high-frequency AC signals [283], electric field tomo- graphy [282], conductive ink sensors [67], the smartphone’s camera [263, 266], and other built-in sensors such as IMUs and microphones [70, 201, 279]. While these approaches do not increase device thickness substantially, their raw data lack details for precise interactions or inferring the touching finger or hand part. While using flexible PCBs as presented in previous work is a promising approach, the resolution is not sufficient. Further, previous work used proprietary technologies so that other researchers cannot reproduce the prototype to investigate interactions on such devices. There is no previous work that presents a reproducible (i.e., uses commodity hardware) full-touch smartphone prototype. 2.2 | Related Work 49 2.3 Summary In this chapter, we discussed the background and previous work on mobile touch interaction. We started this chapter with the history of touch interaction and background of capacitive touch sensing, which forms the foundation of the technical parts of this work. Moreover, we reviewed previous work with a focus on extending mobile touch interaction by hand-and-finger-awareness. Following the structure of this thesis, we first reviewed previous work on hand ergonomics for mobile touch interaction. Previous work investigated the range of the thumb for single-handed touch interaction to inform the design of touch-based user interfaces. However, there is neither work that does the same for all other fingers nor the area in which fingers can move without a grip change. An understanding thereof is vital to inform the design of fully hand-and-finger-aware input methods especially on fully touch sensitive smartphones. We investigate the areas which can be reached by all fingers without a grip change and their maximum range by addressing RQ1. In addition to the reachability aspect, the fingers on the back move unintentionally, amongst others, to maintain a stable grip [52, 54], increase the thumb’s range [34, 54], or as a consequence of the limited independence between the finger movements [76]. These movements cause unintended inputs on fully touch sensitive smartphones which frustrate users and renders all BoD input techniques ineffective. Ideally, BoD input controls need to be placed so that they are reachable without a grip change but also in a way which minimizes unintended input. This requires an investigation of supportive micro-movements, their properties, as well as the areas in which they occur. We address this with RQ2. In the second part of the related work, we reviewed different approaches to extend the touch input vocabulary. A wide range of approaches use different sensors to infer additional properties of a touch. For example, this includes the finger orientation, pressure, shear force, size of the touch area, as well as identifying the finger or part of the hand which performed the touch. However, the presented approaches have practical disadvantages which affect the usability, convenience, and mobility. Input techniques which infer additional properties of a touch (e.g., finger orientation, pressure, or size of touch area) extends the 50 2 | Background and Related Work input vocabulary and its expressiveness. However, they also pose limitations since specific finger postures may now trigger unwanted actions. In contrast, input techniques which differentiate between the source of input (e.g., identifying indi- vidual fingers and hand parts) do not interfere with the main finger for interaction and thus do not hav