Institute for Visualization and Interactive Systems University of Stuttgart Universitätsstraße 38 D–70569 Stuttgart Bachelorarbeit Nr. 351 Thermal Imaging for Interactive Public Displays Alexander Frank Course of Study: Informatik Examiner: Prof. Dr. Albrecht Schmidt Supervisor: Yomna Abdelrahman, M.Sc. Stefan Schneegaß, M.Sc. Commenced: June 20, 2016 Completed: December 20, 2016 CR-Classification: I.4.1, I.4.3, I.4.6, I.4.8 Kurzfassung Wenn man heutzutage in der Öffentlichkeit unterwegs ist, beispielsweise in Fußgänger- zonen, Bahnhöfen oder Einkaufszentren, begegnet man vielerorts Public Displays. Sie versorgen Passanten mit potentiell relevanten Informationen hinsichtlich ihrer aktuellen Lage und Bedürfnisse, beispielsweise Sehenswürdigkeiten oder Bahnfahrplänen. Jedoch hat der Passant nicht immer die Möglichkeit, den dargestellten Informationsgehalt den eigenen Interessen anzupassen. Diese nicht interaktiven Displays aktualisieren ihre dargestellten Inhalte, falls überhaupt, als Reaktion auf interne Signale, über die der Passant keine Kontrolle hat. Aktualisierungen erfolgen beispielsweise in festgelegten Zeitintervallen oder bei Ankunft beziehungsweise Abfahrt eines Zuges. Interaktive Public Displays, andererseits, bieten Passanten die Möglichkeit, die dargestellten Inhalte aktiv zu manipulieren, um ihren aktuellen Interessen gerecht zu werden. Es lassen sich etwa zusätzliche Informationen über Sehenswürdigkeiten oder Angebote eines Einkaufszen- trums anzeigen. Falls ein Public Display nicht von Natur aus Interaktion, beispielsweise per Berührung oder Gestensteuerung, unterstützt, erfordert die nachträgliche Imple- mentierung solcher Interaktionstechniken entweder komplett neue Hardware, die das alte Display ersetzt, oder zusätzliche Hardware, wie etwa Kameras. Das Ergänzen durch neue Hardware gestaltet sich jedoch herausfordernd, wenn gleichzeitig Ansprüche an öffentliche Interfaces beachtet werden sollen. In dieser Arbeit werden Möglichkeiten, sonst nicht-interaktive Displays interaktiv zu gestalten und hierbei die Privatsphäre des Nutzers zu schützen, diskutiert und ein Prototyp auf Basis von Thermographie wird entwickelt. Abstract Nowadays when moving in public spaces such as pedestrian areas, train stations or shopping centers, people can oftentimes encounter public displays. They provide passersby with information potentially relevant to their current situation and needs, e.g. places of interest or train schedules. However, not always does a passerby find a way to tailor the information content displayed to his or her current needs. These non- interactive displays update their displayed contents, if at all, based on internal signals the passerby has no control over. Updates may occur in set intervals of time or upon a train’s arrival and departure, for example. Interactive public displays on the other hand offer passersby the option to actively manipulate the displayed contents in a way they see fit to their current needs. They can, for example, gather more information on a certain place of interest or a shopping center’s offerings. If a public display’s hardware does not inherently support interaction, for example via touch or gestures, retrospectively 3 implementing such interaction techniques requires either all new hardware replacing the old display, or additional hardware such as cameras. Adding new hardware however proves challenging when simultaneously trying to meet the requirements posed upon public interfaces. In this work, possible ways to enable interaction for otherwise non- interactive public displays, while keeping privacy concerns in mind, will be discussed and a prototype using thermal vision is developed. 4 Contents 1 Introduction 11 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.2 Aim of this Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2 Interaction in Public Space & Public Displays 13 2.1 Public Displays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2 Implications on this Work . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3 Thermal Vision 19 3.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.2 Thermal Imaging for Touch Based Interaction . . . . . . . . . . . . . . . 19 3.3 Thermal Imaging for Gesture Based Interaction . . . . . . . . . . . . . . 20 4 Related Work 23 4.1 Thermal imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.2 Public Displays & Public Space . . . . . . . . . . . . . . . . . . . . . . . 24 5 Hardware & Software 25 5.1 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 5.2 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 6 Touch Input 29 6.1 Initial Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 6.2 Acquisition of the Thermal Image . . . . . . . . . . . . . . . . . . . . . . 30 6.3 Detection of Heat Traces . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 6.4 Detection of the Screen Area . . . . . . . . . . . . . . . . . . . . . . . . . 37 6.5 Calculating the Point of Touch’s Relative Position on the Screen . . . . . 44 7 Gesture Input 47 7.1 Initial Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 7.2 Based on the Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . 48 7.3 Based on the camera’s raw output data . . . . . . . . . . . . . . . . . . . 48 7.4 Prefinal Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 7.5 Further considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 5 8 User Study 51 8.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 8.2 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 8.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 9 Conclusion & Outlook 57 9.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 9.2 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 A Additional material 61 A.1 Questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 Bibliography 67 6 List of Figures 3.1 Two different visualizations of the same scene gained using a thermal camera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.2 Field of view and reflection using an RGB camera [SAH+14] . . . . . . 21 3.3 Field of view and reflection using a thermal camera [SAH+14] . . . . . 21 5.1 The thermal camera used in this project . . . . . . . . . . . . . . . . . . 25 5.2 The Raspberry Pi used to process the camera’s output data . . . . . . . . 26 6.1 Impact of varying temperature intervals on the visualization of the scene 31 6.2 Thermal Image of a display featuring a (barely visible) heat trace (inside the white circle) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 6.3 Impact of the camera’s internal temperature on the output with radio- metric mode enabled . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 6.4 Impact of the camera’s internal temperature on the output with radio- metric mode disabled . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 6.5 Filtering the heat traces (and some noise) . . . . . . . . . . . . . . . . . 37 6.6 Detection of the screen area based on its heat emission . . . . . . . . . . 39 6.7 Intermediate results of the different processing steps applied to the ther- mal image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 6.8 Only parts of the screen may lie within the camera’s field of view . . . . 44 6.9 Transformation of the touch point’s coordinates from the camera’s global grid to the screen’s local grid . . . . . . . . . . . . . . . . . . . . . . . . 45 7.1 Motion in the thermal reflection visualized . . . . . . . . . . . . . . . . . 49 7.2 Relative movement of the pixel cluster’s northern extreme point indicates gesture input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 8.1 The prototype gear . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 8.2 The setup used during the study . . . . . . . . . . . . . . . . . . . . . . . 53 8.3 A user interacting with the display using the prototype . . . . . . . . . . 54 7 List of Tables 8.1 Measured results for each circle . . . . . . . . . . . . . . . . . . . . . . . 55 9 1 Introduction 1.1 Motivation In many places, public displays found their way into public space. They all share their purpose of displaying potentially valuable information to passersby, with the exact content varying with the location of installation. In pedestrian areas they offer information about a city’s places of interest, in train stations they show the schedule of arriving and departing trains and in shopping centers they offer information on the layout and offerings, to name a few possible applications. Although they share a common purpose, they vary not only in shape and size, from TV-sized rectangular displays to displays the size of advertising pillars, but also in their possible ways of interacting with passersby. Some displays may serve as billboards displaying advertisements and loop through a series of predetermined adverts, allowing for no interaction with a passerby. Others may update their displayed content based on certain environmental factors in their surroundings, e.g. a public display in a train station may receive an update upon arrival and departure of trains or to inform passengers about disruptions in the railroad traffic. Information displayed this way is more dynamic and time-sensitive than bare advertisements, however, passersby still have no way of interacting with the display to manipulate its information content. Interactive public displays on the other hand offer such possibilities in different ways: Users may touch the display to select items from a menu, get more detailed information on them or enter a search term into a search engine. No keyboard and/or mouse are required and passersby may start interaction freely on their own accord. Another common way of interaction is gesture input: A passerby positions themselves in front of the display and performs gestures in mid-air using their hands or other body parts. The display in turn responds by updating its contents appropriately, e.g. navigating through a gallery of images based on the passerby’s hand motion to the right or left. One might try to enable such interaction techniques for otherwise non-interactive displays as well, however this is in general not possible without additional and/or completely new hardware and software. An ordinary TV screen, for example, that might serve as a public display and does not inherently support input via touch, will not do so without the necessary upgrades. The hardware either has to be replaced with a completely new system which does support touch input, or additional hardware and software have to 11 1 Introduction be installed, e.g. cameras monitoring the screen’s surface to register touch and their respective drivers. 1.2 Aim of this Work This work discusses how to enable interaction for otherwise non-interactive public displays while using affordable additional hardware and keeping concerns unique to interaction in public space in mind. Structure The rest of this work is organized as follows: Chapter 2 – Interaction in Public Space & Public Displays: Aspects and characteris- tics of public displays and interaction in public space are discussed, including their implications on the hardware used. Chapter 3 – Thermal Vision deals with the basics of thermal vision and its advantages over RGB vision regarding this work. Chapter 4 – Related Work: A selection of other publications dealing with partial as- pects of this work. Chapter 5 – Hardware & Software lists the hardware and software used in this project. Chapter 6 – Touch Input summarizes the prototype development for touch input, in- cluding problems encountered. Chapter 7 – Gesture Input summarizes the prototype development for gesture input, including problems encountered. Chapter 8 – User Study: A summary on the user study conducted to test the prototype’s viabililty and its findings. Chapter 9 – Conclusion & Outlook concludes this work by summarizing it and provid- ing a possible outlook on future work. 12 2 Interaction in Public Space & Public Displays As already mentioned in this work’s introduction, public displays can be encountered in different shapes and sizes, however they might not differ much from non-public displays such as one’s TV- or PC-display in the way they work. The way they can be interacted with theoretically does not differ much as well, but in practice, significant differences become apparent. To understand these differences and apply the resulting insights to an interactive interface, the environmental factors surrounding a public display have to taken into account. 2.1 Public Displays The most basic difference between an in-home display and a public display is likely the latter’s location of installation and the resulting consequences on the requirements it has to meet. Factors such as robustness to different kinds of weather and potential exposure to vandalism aside, the display’s location heavily impacts the way passersby choose to interact or not interact with it. For further understanding, let us take a quick look at how touch and gesture interfaces are implemented into public displays and how the way of implementation affects user interaction: 2.1.1 Touch Input in Public Space Touch input may be inherently supported by a public display’s hardware using different technologies. No further distinction between different touchscreen technologies will be made at this point, as the fundamental way of interaction does not change and differences might not even be perceivable to the user. Notable differences however are for example input precision, usability of a touch pen and support for multi-touch gestures such as swiping gestures on the screen. To a user, the basics of touch interaction do not change with technology (aside from use 13 2 Interaction in Public Space & Public Displays of a touch pen or multi-touch gestures): The user touches an area on the screen, hereby performing an action which would correlate to a mouse click in a desktop application using keyboard and mouse. The mouse and its cursor are replaced by the user’s finger and a keyboard can be replaced by an on-screen keyboard, if necessary. The result is an easy and intuitive interaction technique that many users might already know from their own handheld devices such as smartphone or tablet PC. If the hardware does not support touch input, additional hardware is necessary which can register the user’s touch. A possible solution is the installation of one or several cameras in the display’s proximity, for example inside the frame encasing the display. These cameras can be used to monitor the display’s surface and notify the underlying system upon registering a passerby’s fingertip touching the display. Interaction then works the same way as with other technologies although possibly less accurate and fast. Although touch is widely accepted as a means of input for personal devices, proven by the widespread of smartphones and tablet PCs, there are factors unique to public spaces that can potentially reduce a person’s willingness to interact with public displays using the same technology. A first such factor, which will not be further discussed in detail as it is occurs inde- pendently of implementation, is hygiene. People might be uncomfortable touching a display which hundreds, maybe thousands of strangers touched before them. But, as mentioned, this issue is not solvable by means of using different implementations and can furthermore be applied to all kinds of public objects, such as door handles and grab handles in buses. Another problem factor, which also does not directly tie to the means of implementation but public touch displays in general, is the display’s location in combination with the required input. If a touch display requires sensitive data as input, e.g. a user’s bank information, the display’s location and the design of its touch interface, including the positioning of buttons and input fields is of utmost importance. When processing such highly sensitive data, the interface should be designed in a way which prohibits third persons from gaining insight into said data by merely standing near an interacting user and watching the display. The third and last concern discussed at this point regarding touch displays, comes into play when gathering touch input data using cameras. Users noticing a camera installed near a display they are supposed to interact with might hesitate, as the mere presence of a camera might induce a feeling of being watched or even spied upon, thus resulting in a feeling of the user’s privacy being breached. The severity of this concern strongly relates to the size and positioning, thus the visibility of the cameras used. If the user cannot notice the cameras, they are less likely to feel watched. Furthermore, as the most widespread devices using touch input, namely smartphones and tablet PCs, do not rely on cameras to register the input, users might not even consider the possibility of cameras being nearby. 14 2.1 Public Displays 2.1.2 Gesture Input in Public Space For a public display to register gesture input the user’s gestures and/or motion needs to be tracked, therefore a camera is necessary. The camera needs a clear view of the user interacting with the display and is thus installed near the display, e.g. within the frame encasing it. Upon performing certain gestures and hand motions being performed by the user, the display selects items, shifts views etc. Depending on the gestures used, interaction with the public display shapes more or less intuitive. Navigating trough a series of images may intuitively be achieved by moving one’s hand to the right or left, thus "shoving aside" the current view. Selection of an item however might not be as intuitive. Does the item need to be "grabbed", "pushed" or "pulled"? User’s might therefore get frustrated if the display does not react in a way they anticipated. This issued however can easily be circumvented by familiarizing the user with the display’s input modalities, for example by featuring a short manual on the side of the screen or on it or even a short demonstrative walkthrough. At the same time this might reduce a passerby’s willingness to interact even more if they do not want to make an effort familiarizing with the display. Another problem factor results directly from the fact that interacting is based on a passerby’s movement and gestures. While movements and gestures are greatly appre- ciated as a form of interaction with home entertainment systems, judging from the popularity of systems such as Nintendo’s Wii consoles 1 and Microsoft’s Kinect 2, the concept is not easily converted into public space. Whereas a user might have no problem performing gestures to interact with their system of choice at home, they might feel embarrassed performing the same or similar gestures in public, thus in front of a crowd of strangers. Additionally, interacting with a public display bears the risk of failure, either due to faulty interaction on behalf of the user or missing reaction from the display, being poten- tially visible to everybody in the user’s proximity, further nurturing the user’s feeling of embarrassment. Not even considering the potential for failure, a passerby might simply not want to become the center of attention by suddenly stopping in front of a display and performing gesture usually unnatural to their behaviour. On the other hand, this "unnatural" behaviour might encourage other passersby to engage in interaction with the display out of curiosity. The third and last concern links to the need of cameras for gesture recognition. As with cameras used for touch input, the use of cameras for gesture detection rises concerns regarding the user’s privacy. Due to the camera being installed in public space and thus "always being there", passersby might feel being watched even when not inter- 1https://www.nintendo.de/Wii/Wii-94559.html 2https://developer.microsoft.com/de-de/windows/kinect 15 2 Interaction in Public Space & Public Displays acting with the display and just passing by. A passerby does not know when or what or who the camera is currently watching. Again, passersby have to potentially fear a breach of their privacy as the identity and intentions of the people having installed the camera cannot be fully assured at all times. A third party might have hacked into the system and use it to spy on passersby. With such potential thoughts in mind, passersby might actively avoid public displays as soon as they notice a camera. Additionally, unlike touch displays, the use of a camera for gesture input is intuitively obvious, as someone or something has to watch the user to pick up on their gestures and movements. 2.2 Implications on this Work After having considered the environmental factors and requirements unique to inter- action in public space, approaches for enabling interactivity for non-interactive public displays can be considered. As we do not want to upgrade the display’s hardware itself, interaction has to be enabled using solely additional hardware and software. With gesture recognition inherently rely- ing on input filmed by a camera and touch recognition being realizable using cameras as well, the use of a camera as additional hardware seems obvious. However, due to the privacy concerns regarding the use of cameras in public space which were just discussed, refinements to the initial approach are necessary. Since the installation of a camera in the display’s vicinity is not an option, another way of using a camera needs to be found. An appropriate way would be to - instead of using a stationary camera - use a camera a passerby can carry with them, thus giving them control over when and where to use it. This would require the used camera to ideally be as small and light as possible as to not obstruct the user during interaction with the display and their other everyday activities. This leads to the idea of using a wearable device, a personal device directly integrated into the wearer’s clothes or accessories. With cameras the size of a fingernail being available nowadays, using such a small camera provides an ideal solution to our problem. The camera can for example be installed within a pendant the users wears around their neck or within the frame of a pair glasses akin to the Google Glass 3, thereby minimizing obstruction of the wearer’s daily routines. With the hardware to use being decided upon, a rough idea of how interaction is sup- posed to take place can be realized, which leads to the conclusion that an ordinary RGB camera does not pose an ideal solution due to several reasons. 3https://www.google.com/glass/start/ 16 2.2 Implications on this Work 2.2.1 Limitations of RGB cameras Although touch interaction can theoretically be enabled by tracking a user’s fingertips, gesture interaction proves to be cumbersome regarding the ease of use for the user. Naturally, the gestures used for interaction have to be performed within the camera’s field of view. Depending on the position of the camera on the user’s body, e.g. inside a pendant around their neck or inside the frame of their glasses, the user’s available space for interaction is greatly limited. All the gestures and movements have to be performed within the space in front of the users upper body or head respectively. Performing a gesture next to one’s body is not possible with a wearable RGB camera. Additionally, depending on the camera’s field of view, gestures might have to be per- formed a minimum distance away from the camera, further limiting the user’s space for interaction by not only requiring their hands to stay in front of their body but also their arms being stretched out to the front. This issue is solved by using a thermal camera instead of an RGB camera. As to how the use of thermal vision solves the issue of restricted space of interaction during gesture input, please see the next chapter. 17 3 Thermal Vision 3.1 Basics Thermal imaging creates images based upon the longwave infrared radiation emanated by objects in a given scene. Every object, animal and person emanates infrared waves (assuming their temperature lies above absolute zero), the warmer the object the higher the amount of radiation. Therefore, conclusions on an object’s temperature can be drawn from its amount of radiation. The thermal images produced this way give information on the differences in temperature between different objects. Furthermore, thermal imaging does not rely in illumination, it works in light environments as well as in dark environments. This might for example be used for detection of creatures such as animals and humans and their movement in a scene, as their body temperature usually surpasses the temperature of the environment, thus making them well visible in the thermal image. A real life application using thermal imaging this way is for example the usage for surveillance cameras, as this technique works independently of the monitored scene’s illumination. Other applications include firefighters being able to detect survivors through smoke during fires, detection of heat leaks in buildings and medical applications. Different ways of visualizing the radiation data are available, as shown in 3.1. The attributes unique to thermal imaging that make it an attractive solution for interaction with a public display are discussed in the following respective subsections. 3.2 Thermal Imaging for Touch Based Interaction As just explained, every human emanates a certain amount of infrared radiation based on their body temperature. When a person touches an object with different temperature than themselves, part of their skin temperature is transferred to object, thus influencing the object’s temperature at the point of contact. Upon breaking up contact, part of the person’s skin temperature temporarily lingers on at the point of contact. In a thermal image, the contrast between the temperature change at the point of contact and the rest of the object’s surface becomes visible. Over time the temperature at the point of contact returns to the normal temperature before the contact. Applying this concept to 19 3 Thermal Vision (a) Ironblack Scale (b) Rainbow Scale Figure 3.1: Two different visualizations of the same scene gained using a thermal camera a display, it is possible to detect heat traces in areas of the display a user just touched, thus enabling detection of touch input. 3.3 Thermal Imaging for Gesture Based Interaction At the end of the previous chapter the negative implications on the comfort of use when using RGB cameras for gesture detection were explained. When using a thermal camera instead, this issue is circumvented due to a phenomenon referred to as thermal reflection: Not only do different objects and materials emanate different amount of radiation, some materials, such as glass, instead completely reflect radiation. Therefore, whereas these materials have no special properties when viewed through an RGB camera (3.2), they behave towards thermal cameras as mirrors behave towards RGB cameras. If a user with a wearable thermal camera stands in front of a display and films the display, the area of the display in the thermal image does not show the contents displayed on it but rather the reflection of the user. A visualization of this effect is shown in 3.3. The thermal camera’s field of view thereby does not only include the area in front of its lens, but also the areas beside and behind it, reflected in the surface of the display. These areas added to the field of view serve as additional space for gestures and movement to the user, thus eliminating an RGB camera’s shortcomings in that regard. 20 3.3 Thermal Imaging for Gesture Based Interaction Figure 3.2: Field of view and reflection using an RGB camera [SAH+14] Figure 3.3: Field of view and reflection using a thermal camera [SAH+14] 21 4 Related Work As this work merges aspects of thermal imaging, interaction with public displays, as well as interaction in public space in general, an immense amount of research and its results related to it can be found. With technology used in thermal imaging and interaction with public displays becoming more affordable in recent years, they amount of research conducted in the respective fields increased as well. Likewise, an increase in technologies supporting interaction in public space came with an appropriate amount of research regarding the implications of these technologies on user’s privacy, as well as other social factors surrounding it. 4.1 Thermal imaging Use cases in military, medical, industrial and many other fields aside, thermal imaging also has an immense impact on human-computer-interaction. A considerable amount of attention within this field is focused on the use of thermal cameras to enable means of interaction in otherwise non-interactive systems: In [LCG+11], Larson et al. present their project named HeatWave. HeatWave uses thermal cameras to transform an arbitary surface, such as a table top into an interactive multi-user surface supporting input via touch and gestures performed on the surface by tracking the residual heat traces left by the users’ fingers. A similar project called Dante vision is presented in [SLP12]. Combining thermal cameras and depth cameras, such as the Microsoft Kinect, creates a projection system which allows support of touch and in-air gestures as input. Multiple users may perform different tasks at the same time. Palovuori and Rakkolainen use thermal imaging to create a system for touch based interaction with an immaterial fog-screen [PR15]. In [SAH+14], Sahami Shirazi et al. exploit the thermal reflectivity of certain surface materials to create an interactive system using the reflection of a user’s hand gestures as input. Other works in this field of research include the usage of thermal cameras for face recognition [KK05] and expression recognition [SWZ15]. 23 4 Related Work 4.2 Public Displays & Public Space Research on public displays ranges from the proposal of general guidelines [ASS+12] to the investigation of novel interaction techniques [Sch15] [KFF+09] to assessments of potential privacy concerns [OKB15] [TGC06]. In [Sch15], Schneegass proposes the usage of wearable devices for interaction with public displays, an approach similar to this work’s aim. In [ASS+12] Alt et al. present a set of guidelines for evaluating public displays, based on extensive literature research as well as own experiences reagrding the subject. Several works dealing with ways of protecting a user’s data in public space, e.g. from they eyes of third parties, such as [TGC06] and [EBBF04]. 24 5 Hardware & Software 5.1 Hardware The thermal camera used to gather data is a FLIR Lepton® longwave infrared (LWIR) sensor, including a socket and breakout board to connect it to the processing hardware. The Lepton® distinguishes itself by its compact size of 10.6 x 11.7 x 5.9 mm (including the socket), rendering it smaller than a 1ct piece, thus being a convenient solution for integration into mobile systems such as wearable devices. Furthermore it features a resolution of 80x60 pixels and effective frame rate of 8,6Hz. It measures temperatures ranging from -10°C up to 65°C and operates with a thermal sensitivity of 0.05 Kelvin. The captured wavelengths lie within the spectral range of 8 µm to 14 µm. The lens provides a 51° horizontal field of view and 63.5° diagonal field of view. 5.1a demonstrates the compact size of the camera module itself, 5.1b shows the camera including socket and breakout board. (a) (b) Figure 5.1: The thermal camera used in this project 25 5 Hardware & Software Figure 5.2: The Raspberry Pi used to process the camera’s output data Image processing functions and the actual touch/gesture recognition algorithms run on a Raspberry Pi® 3 Model B (Initially an older model, the Raspberry Pi® 1 Model B, was used. Due to performance issues a change to current hardware was conducted.). It features a Quad Core 1.2 GHz processor, enabling real-time processing without performance issues. A picture of the Raspberry Pi® used in this work is shown in 5.2. Communication between the Lepton® and Raspberry Pi® takes place via the Raspberry Pi®’s SPI and I2C ports, streaming video data to the computer and sending commands to the camera respectively. Initially, the prototype was supposed to be tested on one of the Institute for visualization and Interactive Systems’s public displays, however during development changes to this plan were made, resulting in the prototype becoming independent of the display used. More on this subject in chapter 6. More information on the hardware used in this work, can be found on the respective websites 12 5.2 Software The programming language of choice I used in this work is C++, using additional Qt and OpenCV libraries for image capturing and processing respectively. The code used in this work to read the Lepton®’s data and output it as a thermal image is openly available at a GitGub repository 3. Although Qt is not required to process the Lepton®’s data, it used in this work, as otherwise rewriting of the code for image capturing would have been necessary, increasing the probability of errors occurring. Furthermore, usage of the Qt and OpenCV libraries in tandem only requires a simple conversion between the libraries 1http://www.flir.com/cores/lepton/ 2https://www.raspberrypi.org/ 3https://github.com/groupgets/LeptonModule 26 5.2 Software respective image formats. OpenCV is used to further process the Lepton®’s output image. OpenCV distinguishes itself by a large spectrum of available image processing functions, requiring little effort during implementation, as many functions are implemented as single lines of code. For further information on QT, OpenCV and their different functions, see their respective documentations 4 5 4http://doc.qt.io/qt-4.8/ 5http://opencv.org/documentation/opencv-3-1.html 27 6 Touch Input This chapter reports on the work conducted to implement a prototype detecting touch input on public displays using thermal vision, problems encountered and solutions to said problems. 6.1 Initial Approach Before starting to work on the implementation, careful consideration of which steps are necessary to enable meaningful touch-based interaction is required. The two leading questions are: Which area(s) in the thermal image correlate to the leftover heat of a user’s touch? Which area(s) on the public display did the user touch? Although the two questions sound similar, there is a very important difference between them: Naturally, registering touch input requires detection of leftover heat traces from a user’s touch. However, solely pinpointing the position of the relevant pixels in the thermal image does not suffice. As the thermal image is likely to either not only picture the surface the user is interacting with, but its surroundings as well and/or picture only parts of the surface, it is necessary to transform the pixels coordinates in the global grid within the camera’s field of view into coordinates in the screen’s local grid. To do this, the screen’s relative position has to be taken into account as well. Consideration of said questions results in the following four essential steps: 1. Get the thermal image from the camera 2. Identify the position of the point of touch (on the display) 3. Locate the location of the screen 4. Compute the position of the point of touch in relation to the screen area The following subsections will go into more detail about the different iterations of code that were created during the implementation and the potential problems associated with them. Furthermore, different approaches to solutions to these problems and their feasibility are discussed including the solutions that were finally deemed appropriate. 29 6 Touch Input 6.2 Acquisition of the Thermal Image As already mentioned in chapter Hardware & Software the piece of code used to read the thermal camera’s output data and converting it into a colorized image is available at the respective repository. No substantial changes to the code’s functionality were made. As the way the thermal camera outputs the pixel data and how this data is assigned a visualization, a short explanation of the code’s functions follows: Each frame consists of an one-dimensioal array storing 4920 14-bit values, one such value for each pixel. (The 120 additional values are header bites and can therefore be ignored.) For each frame, the maximum and minimum value are determined and used to compute a scale needed for the colorization. A colormap containing a set number of RGB-color values is used to color the pixels in the visualization based on their respective value and the just computed scale. The way the camera’s data is processed leads to several early insights and assumptions: 6.2.1 Relation temperature-value-color Based on the given code I observed no inherent relation between temperature and the assigned 14-bit value/color, aside from the relations maximum temperature->maximum value and minimum temperature->minimum value. As each frame uses the same colormap and the relation 14-bit value->color is reestablished with every new frame based on the maximum and minimum values, objects that do not experience a change in measurable temperature can be assigned vastly different colors in different frames due to changing environmental temperatures. An example for this is shown in 6.1: Two scenes which were observed within seconds of each other. Both scenes feature a tray of ice cubes put on the floor, the second scene however also introduces a person’s fingertip. In both scenes, the minimum temperatures and therefore the minimum values are located with the ice cubes. The maximum temperatures and values however shift from the floor in the first scene to the fingertip in the second scene. Although changes to the floor’s temperature in between the two scenes is likely negligible, its respective colorization vastly changes. This example is applicable to any change to the observed temperature range in between scenes, although to a lesser extent, if changes are not as significant as in the given example. 6.2.2 Relation color-color As implicated in the aforementioned insight, reestablishment of the relation value-color for each frame has a (for this work) negative impact on the relation color-color in 30 6.3 Detection of Heat Traces (a) Low maximum temperature (b) High maximum temperature Figure 6.1: Impact of varying temperature intervals on the visualization of the scene between two different frames. This might prove problematic for heat trace detection based on pure image processing. These insights are added to by further insights that only became evident during the implementation phase. 6.3 Detection of Heat Traces Deciding whether a pixel in the thermal image correlates to a heat trace left by a user or not, required several iterations until a satisfying result was achieved. This was the case partly due to technological limitations as explained further in the following subsections. 6.3.1 Based on the Visualization Visualization-based processes for detecting heat traces on a display proved to be unre- liable very early, the major problem factor being the used display itself, e.g. its own heat radiation. As seen in 6.2, not only does the display’s inherent heat variation vary significantly depending on the partial area of the screen, it also reaches peak values surpassing the values correlating to a user’s heat traces. As a result, association of a set range of colors with a heat trace is not possible. Furthermore, in many cases contrast between heat trace and display proves to be insufficient for providing a clear distinction, rendering detection based on the shape of a heat trace unreliable as well. 31 6 Touch Input Figure 6.2: Thermal Image of a display featuring a (barely visible) heat trace (inside the white circle) Neither adjustments to the used colormap, different filtering and thresholding methods nor enhancement of contrast led to a meaningful increase in reliability of the touch detection algorithm. With heat trace detection based on shape and color left unfeasible on its own, a new approach using different data had to be found. Using the camera’s raw output data was chosen. 6.3.2 Based on the Camera’s Raw Data Work with the camera’s raw data happened in two distinct phases. In the first phase, the thermal camera was assumed to be stationary, ignoring movement inherent to real life application. Using the insights gained from these stationary approaches, I started work on dynamic approaches, which considered the camera’s movements in their algorithms. Stationary Approaches While implementing a heat trace detection algorithm solely based on differences in numerical values instead of differences in shape and color, three inherently different approaches were tested, with varying result. In the initial phase, camera movement was still left ignored. First, a simple, static approach was tested. The used algorithm checks whether the value of a pixel lies within a set interval of values deemed to match the heat trace of a touch point. If this test returns a positive result, the pixel is assigned a color distinctively different from the used colormap, making it easily recognizable in the visualization. 32 6.3 Detection of Heat Traces The appropriate intervals were derived from measurements of the display, a person’s fingertips and touch points on the screen. The goal was to ideally mark the pixels were a user touched the display, while ignoring the display and the user’s hand as reliable as possible. Initial tests proved to be more successful than the image processing based approach. Heat traces were distinctively highlighted, while only the edges of a user’s hand and small areas of the display were highlighted. However, this algorithm’s accuracy rapidly degraded over time as the values gained from the camera steadily increased and the intervals set in the code no longer matched with the data gathered from the camera. Continuous recalibration and adjustment of the intervals boundaries was necessary to regain, therefore rendering this approach non-viable. Second, an initial calibration phase was used to gain a better estimate of the range of values corresponding to the display. The camera would be statically mounted in front of the display and compute a dynamic mean of the read values for several seconds. Afterwards each pixel’s value would be compared to the computed mean and the pixel would be highlighted if its value was significantly higher than the mean. In this case, "significantly higher" describes a difference in values high enough to not be deemed part of the display but low enough to not be linked to the radiation of a person’s hand. Although more complex than its predecessor, this approach yielded no significant improvements as it was still "too static" and the slow change of values lowered its accuracy. The third approach moved away from the concept of comparing the pixels’ values to a static value, instead using relative changes in a pixel’s value as an indicator of touch. For each pixel, the algorithm would compare its current value to its value in the previous frame. A significant change would highlight the pixel. However, this algorithm was prone to errors resulting from camera movement, thus proving inappropriate for this work. Dynamic Approaches At this point, the camera’s movement was included into the algorithm’s design, as it proved too difficult to ignore without deviating from the actual goal of this work. The first approach in this phase used local differences in pixels’ values to detect heat traces, the method also used in the final approach. With this approach, each pixel’s value is compared to the values of its (up to) eight neighbouring pixels. If the difference exceeds a set threshold but is low enough to not be linked to the user’s hand, it fulfills the first of two requirements to be deemed a heat trace. The second condition requires that only marginal changes to the pixel’s value occurred between the previous and current frame, as a significant change implicates one of the following two scenarios: 1) An object with considerable heat emission, e.g. a user’s hand becomes visible or invisible or 2) Due to camera movement, the object seen at the pixel’s location changes. Combining these two 33 6 Touch Input conditions should theoretically only leave the following areas applicable to marking: 1) Heat traces on the display, as they only slowly cool down, 2) Some minor spots around the outlines of a user’s hand during slow movement and 3) Other areas with distinctively varying temperatures. The third kind of area is the issue with this approach, as these areas do not only lie outside the area of the display, but rather within it. Most of the areas marked by this approach are areas on the display that do not correlate to heat traces. Once again the screen’s inherent heat radiation behaved too similar to a heat trace. As this phenomenon is directly linked to values measured by the thermal camera, I conducted some further research on this topic. 6.3.3 Variance in output data and its cause As the Lepton® outputs data as 14-bit values, one might expect outputs ranging from 0 to 16383, with temperatures near the camera’s minimum and maximum operating temperature being assigned values close to 0 and 16383, respectively. However, this is not the case, as became apparent during work with the camera. Output values encountered during implementation ranged from upper 7000s to lower 9000s, while tests measuring temperatures close to the camera’s minimum and maximum operating range resulted in values in the lower 7000s and lower 10000s respectively. Furthermore, output values while focusing on one sole point in a scene varied as well. While this phenomenon was initially accounted to a natural warming-up-process regarding the displays monitored, this reasoning could not be applied to the change in output values for other objects, including a person’s skin. Further investigation revealed the cause for this variance in output: According to the FLIR Lepton® Data Sheet 1, output values rely heavily on whether the camera’s so called radiometry mode is disabled or enabled. 6.3 shows the theoretical output of a Lepton® module with the radiometry mode enabled. As you can see, output values directly correlate to a certain temperature. The camera’s internal temperature does not affect the output values. With the radiometry mode disabled however, output values depend on the thermal camera’s internal temperature, as shown in 6.4. Only one output value is "fixed" to a temperature. If an object in a scene has the same temperature as the thermal camera, it is assigned the value 8192. The other values are assigned linearly. This revelation renders work with the radiometric mode enabled more desirable, as the measured values would no longer fluctuate as much as they did so far. However, this is not on option, as the radiometry mode is an option reserved to original equipment 1https://cdn.sparkfun.com/datasheets/Sensors/Infrared/FLIR_Lepton_Data_Brief.pdf 34 6.3 Detection of Heat Traces Figure 6.3: Impact of the camera’s internal temperature on the output with radiometric mode enabled manufacturers. To achieve a similar result using the given equipment, additional experiments would have been required. I would have needed to measure the output value for an object with precisely known temperature and use linear interpolation to get an approximation of the values corresponding to other temperatures. This procedure would require repetition for different states of the camera’s internal temperature, finally rendering it too time consuming and more importantly too unreliable, as an algorithm based on these measurements would be inherently prone to errors resulting from inaccuracies in the measurements. Therefore I abandoned the idea of using any static values in my algorithms, solely relying on relative differences in pixel values. However, judging from the results of the last approach, this still was not enough, as the display’s heat radiation remained a constant source for errors regarding the heat trace detection. 6.3.4 Using intermediate Layer for Trace Detection As detecting heat traces directly on the display’s surface proved unreliable due to a lack in contrast color-wise as well as value-wise, a solution compensating for the display’s 35 6 Touch Input Figure 6.4: Impact of the camera’s internal temperature on the output with radiometric mode disabled heat emission needed to be found. In this case a slab of plexiglass mounted a few centimeters in front of the actual display as an intermediate means for touch interaction proved to be a solid solution. Not only does the plexiglass not obstruct visibility of the display to the user, it also does not heat up as fast as the display. The last point is especially important as with a cooler surface, the contrast between heat trace and surface is much more easily detected using the thermal camera. Although an approach relying solely on image processing is still not feasible due to the display’s heat radiation from behind the plexiglass interfering with the colorization, the difference in raw values is much clearer and thus appropriate as an indicator of heat traces. The algorithm used in the final approach does not differ from the previous algorithm, but it is way more reliable when using plexiglass as a surface as opposed to a display. The following subsections explain the steps performed by the algorithm. Modification of the thermal image To facilitate further processing, I applied a few modifications to the thermal image (using a grayscale image). As described in an earlier section I highlight pixels whose values 36 6.4 Detection of the Screen Area (a) modified thermal image (b) thresholded image Figure 6.5: Filtering the heat traces (and some noise) differ significantly from the values of their neighbouring pixels and whose values did not change significantly from the previous frame. This time the algorithm marks only heat traces, the area surrounding the plexiglass and a few scattered pixels inside the screen area which are results of thermal reflection. To make for a strong distinction, the pixels are colored red. The result is shown in 6.5a. Filtering out the heat traces After blurring the image to remove noise from thermal reflection, I convert the image from RGB colorspace to HSV colorspace. Using the HSV colorspace I am able to filter out all areas with a certain range color, in this case "red". The result of this operation is seen in 6.5b. At this point the algorithm stops and its intermediate result is used further upon successful detecting of the screen area. 6.4 Detection of the Screen Area As explained in the beginning of this section, knowing the screen’s position is necessary to enable reliable touch input. The used algorithm needs to decide whether the objects and surfaces in the thermal image are part of the display or not. Assuming a distinction between screen and non-screen areas has been made, another issue needs so be kept in mind: If the camera moves too close to the screen, not all of the screen area is visible, the edges of the display lie beyond the edges of the thermal image. 37 6 Touch Input 6.4.1 Based on the Visualization In a first approach, different colormaps were used and compared to see whether any of them led a significant difference in colorization between screen and non-screen areas, potentially enabling a distinction based on color values. However, as the colorization using the multi-color-based versions was subject to scaling based on maximum and minimum values in the thermal image, no such clear distinction was possible. The only colormap that provided the desired distinction was a grayscale colormap. Using a grayscale, the area of screen can be detected based on the intensity of the gray colors in the image. As already mentioned, the used display is a source of significant heat radiation itself, resulting in higher intensity values than most of its surroundings in the thermal image, see 6.6a. Using this information Otsu Thresholding and additional colorization is applied to the thermal image, resulting in 6.6b. As you can see, the area of the screen is recognized very precisely, however other sources of heat are picked up as well, in the given scene, these are the edge of an adjacent display and a hot cup. These smaller hot areas in the scene have to be labeled as non-screen areas, which is achieved by relying on two assumptions: 1) The screen area is the largest highlighted area in the scene. 2) The screen’s height and width are known. The first assumption is made as during interaction, the user should focus their gaze on the display in front on them, therefore resulting in most of the scene on the thermal image consisting of the screen area. The second assumption is made to ensure that no other large heat source would accidentally be interpreted as screen. These two assumptions are implemented as follows: After thresholding and filtering, contour detecting is applied to the scene and only the largest contour is accepted as screen area. Next a convex hull was put around said contour and the hulls rotated bounding box was created. The result is shown in 6.6c. The area recognized as screen is encased in a green bounding box. If the bounding box’s proportions matched the screens proportions, e.g. 16:10 (while leaving room for a margin of error of a few percent), it would be declared the screen area. This approach proved to reliably detect and highlight the screen area on a frame-by-frame basis, therefore also providing robustness against camera movement. However, this initial approach does not account for the possible scenario of only parts of the screen being visible. In these cases, the created bounding box would not match the screen’s proportions and therefore not be usable. And ideal solution to this problem would require information on the camera’s distance to the screen, which was not possible due the lack of a second camera needed for a sense of depth. Therefore, an approximation had to be made, that does not rely on exact distance values. An initial idea looked liked this: The algorithm would output a visualization featuring the bounding box encasing the screen area. If the bounding boxes proportions match the screen’s proportions, it would change color. During this phase, the current bounding box could be "locked in" via manual input. From this point on, the bounding boxes proportions and size would 38 6.4 Detection of the Screen Area (a) thermal image (b) thresholded and colored image (c) screen area detected Figure 6.6: Detection of the screen area based on its heat emission not change any further and any relative movement of the screen in the scene would be applied to it. This approach had the advantage that it reliably keeps track of the bounding box’s and therefore the screen’s coordinates even beyond the boundaries of the thermal image. The disadvantage linked to this method is the underlying assumption that once the screen has been locked in, no further changes in distance between screen and camera occur. This assumption seems inappropriate to real world use cases, justifying the need for a more refined algorithm. 6.4.2 Based on the Camera’s Raw Data Algorithms using the camera’s raw numerical data offered no clear advantages over the previous method using intensity values in a grayscale image during this phase of implementation and were therefore not considered any further. 6.4.3 Final Approach The change from the display itself to a slab of plexiglass as surface for interaction had an immense impact on the requirements for the screen detection algorithm. As the plexiglass blocks of most of the screen’s heat radiation, therefore the basis of the previous algorithm, a new approach is necessary. Although the previous algorithm still highlighted parts of the plexiglass as potential screen areas, the result was far from acceptable. Adjustments to the algorithm, e.g. the thresholding and filtering operations did not result in any improvements. For a new algorithm, the screen’s as well as the plexiglass’ respective properties have to be taken into account and compared: Whereas the screen possesses a relatively high and fluctuating heat emission over its area, the plexiglass heats up much more slowly and evenly. In a thermal image this becomes 39 6 Touch Input evident by heat traces of points of touch being much more well distinct on the plexiglass than on the screen. However, with parts of the screen’s heat emission still visible in the thermal image, the colorization of the heat traces does still not prove to be a reliable criterion for detection. A detection algorithm based on the camera’s raw data seemed to be required. As with previous approaches using the camera’s raw data, approaches based on comparing pixel values to other set values, yielded no satisfying results. The final algorithm used to detect heat traces intuitively bears a strong resemblance to an algorithm needed to detect the plexiglass. Whereas the heat trace detection looks for local maxima in the thermal image, the algorithm detecting the plexiglass cares about detecting an area with largely similar values. As already described in a previous section and seen in ??, the heat trace detection algorithm initially not only highlights heat traces, but also the area surrounding the area of the plexiglass. Apparently, the difference in values between the plexiglass and its immediate surroundings is significant enough to be detected. Therefore the inversion of the initial heat trace marking algorithm is used as basis for the screen detection algorithm. In the following I will expand on the steps performed by the final screen detection algorithm, from being input a thermal image to outputting a monitored screen’s approximate coordinates: Modification of the thermal image Before applying any actual image processing functions to the thermal image using the GitRepository’s code, I first of make some modifications to the image, simplifying the following steps. The thermal camera’s initial output image, as seen in 6.7a is modified in a way similar to the procedure described in the dynamic approach for detecting heat traces, this time however, the pixels that would be marked in that step, are "not marked", whereas the unmarked pixels are now marked. Marked pixels are colored white, unmarked pixels are colored black, maximizing the contrast in intensity between the relevant areas in the screen. The result of this modification is shown in 6.7b. Although this initial step highlights the area of the screen, it also highlights areas scattered around it, potentially leading to false positives (non screen areas interpreted as part of the screen). These areas need to be removed from the image. Clearing up the image First of, a Gaussian blur filter is applied. As shown in 6.7c, this results in the smaller non-relevant areas around the screen area being colored in grayish tones, whereas the screen areas itself is colored mostly white, therefore having a much higher intensity than most of the scene. Next, erosion followed by dilation takes place. This removes most of the noise surrounding the screen from the scene, as shown in 6.7d. Lastly, the 40 6.4 Detection of the Screen Area (a) initial thermal image (b) differences in values highlighted (c) after blurring (d) after erosion & dilation (e) after thresholding Figure 6.7: Intermediate results of the different processing steps applied to the thermal image 41 6 Touch Input scene is thresholded using Otsu Thresholding, leaving the screen area and some smaller parts of the surroundings that can safely be ignored. This steps final result is depicted in 6.7e. Closer inspection of the initial thermal image and the processed image shows that the screen area is not 100% accurately picked up. This stems mainly from the erosion process, which was necessary for a usable result in my tests. A potential fix to this issue could be found Assumption of the screen’s position Using the thresholded image from the previous step, I first run a contour detection algorithm on the image, segregating potential screen areas from the rest of the scene. As the user is assumed focus their gaze on the display during interaction, only the contour spanning the largest area is further processed. Before creating the bounding box around it, I first create a convex hull around the contour to compensate for any indentations in the contour resulting from previous processing steps. Lastly, the minimum rotated bounding box is created, encasing the potential screen area. To gauge whether the created bounding box actually correlates to a display needs further testing conducted in the next step. Calculation of the screen’s coordinates The checks and modifications performed in this step rely on the assumption that the screen’s proportions are known to the algorithm. If the rotated bounding box’s propor- tions match the screen’s proportions (within a preset margin of error), the bounding box is accepted and its vertices’ coordinates are used to approximate the screen’s area. This approach however relies on the bounding box’s vertices lying within the camera’s field of view, which is not always the case, therefore several different cases have to be considered and appropriate approximations have to be conducted: Case 1: Four vertices This is the default case, shown in 6.8a. All four vertices lie within the camera’s field of view. No further approximations are necessary. Case 2: Three vertices The case shown in 6.8b requires no further consideration, as the algorithm to create the bounding box automatically correctly calculates the outlying vertex’s coordinates. 42 6.4 Detection of the Screen Area Case 3: Two vertices This case is composed of the two subcases depicted in 6.8c and 6.8d, either two adjacent or two opposing vertices lying outside the field view. With the first case, one edge of the screen lies completely within the field of view. Based on the bounding box’s orientation, this edge is assumed to match the screen’s width or height. Using the knowledge on the screen’s proportions, the coordinates of the two outlying vertices can be computed. The other subcase, although unlikely to occur during real life application, as it requires a considerable tilt of the camera, thus the user, to one side, is accounted for by computing the outlying vertices’ coordinates using the screen’s proportions and the length of the diagonal within the field of view. Case 4: One vertex The case shown in 6.8e cannot be solved without precise knowledge on the distance between camera and screen. As there is an infinite number of rectangles matching the screen’s proportions, the outlying vertices’ coordinates cannot be computed using a single vertex as input. Therefore, an approximation is made that the distance to the screen did not change in between the last and current frame. Thus, no adjustment to the screen’s relative size is made and the last detected screen area is merely shifted, based on the visible vertex’s position and the bounding box’s orientation. Case 5: No vertices In the case shown in 6.8f the screen’s edges lie outside the camera’s field of view. As with the previous case, the screen’s position cannot be calculated precisely. However, this time no approximation can be made either as it would require knowledge on the distance between screen and camera. Furthermore, a distinction between screen and non-screen areas is not possible. Using a potentially detected thermal reflection of the user’s hand does not suffice either, as thermal reflectivity is not restricted to display’s exclusively. For the listed reasons, this case is not accounted for by this algorithm. Other cases There are several additional cases, more precisely subcases of case 4 and 5 that will not be discussed further at this point. The subcases resemble the second subcase of case 3 and involve the visibility of at least two opposing edges. They are edge cases and handled similar to case 2. Although the algorithm accounts for all of these cases, only the first two cases an to some degree the third case as well, can be deemed accurate. Now that the screen’s position was approximated, the coordinates of its vertices can be used in the final steps of the heat detection algorithm. 43 6 Touch Input (a) (b) (c) (d) (e) (f) Figure 6.8: Only parts of the screen may lie within the camera’s field of view 6.5 Calculating the Point of Touch’s Relative Position on the Screen So far I have calculated the coordinates of the screens vertices and created a thesholded image containing the searched heat trace as well as some noise. I run a contour detection algorithm on the thresholded image, ignoring all contours whose area is too large to be deemed a heat trace resulting from a fingertip. For each leftover contour I check whether its center lies within the screen area using barycentric coordinates. If it does, it is recognized as heat trace and its coordinates are transformed into the screen’s local grid, exemplified in 6.9. These coordinates are used in following frames, as to ignore recent heat traces which no longer signify input. Heat traces within the proximity of the saved pixels coordinates are only recognized as new input, if their value assigned by the thermal camera shows an increase. 44 6.5 Calculating the Point of Touch’s Relative Position on the Screen (a) camera’s global grid (b) screen’s local grid Figure 6.9: Transformation of the touch point’s coordinates from the camera’s global grid to the screen’s local grid 45 7 Gesture Input Due to time constraints becoming apparent during the development of the prototype using touch input, development of a prototype using gesture input was halted in favor of developing a working prototype for touch input. Therefore, the prototype using gesture input did not reach the phase of being tested in a user study. Its validity is thereby questionable and will not be assured for in this work. Furthermore, work on the gesture interface started in the late phases of development of the touch interface, resulting in insights gained then facilitating implementation of certain partial features needed for gesture input. Other features could not be finished in time for a user study and potential ways to implement them will be discussed at this point instead. 7.1 Initial Approach Initially the kind of gesture used for input was decided on. Possible gestures include gestures using one or multiple fingers, e.g. pinching thumb and index finger and pointing using the index finger, as well as gestures using the whole hand, using the relative movement of the hand as input, e.g. swiping gestures to the left/right and up/down. The second option was chosen as it promised a simpler implementation and a potentially higher degree of intuitivity. The naive approach of detecting movement gesture using the user’s hands looked like this: 1. Get the thermal image from the camera 2. Pinpoint the position of the user’s hand in the current frame 3. Pinpoint the position of the user’s hand in the previous frame 4. Compute the relative movement of the hand using these coordinates. This line of implementation would rely almost completely on the detection of the user’s hand and does not need to take the display’s exact position into account, as opposed to the methods used for touch detection. 47 7 Gesture Input 7.2 Based on the Visualization During work on the touch based approaches, thermal reflection, which makes up the basis of the gesture based approaches, was barely noticeable. This implicated that hand detection on basis of the visualization would not be feasible. Further testing confirmed this suspicion. Several factors impeded the detection of hands in the thermal image: Contrast between a user and the background was small, rendering detection based on contrast unreliable. Furthermore, due to the thermal camera’s relatively low resolution of 80x60 pixels, making out a hand’s outline proved harder the further away it was from the display. For these reasons, I proceeded as with the touch based approaches and further investigated the thermal camera’s raw output data. 7.3 Based on the camera’s raw output data Although there was a difference in output values between the background and a hand, it was not significant enough to make for a useful criterion for hand detection. Furthermore, these differences in values were only noticeable during movement of the hand. While still, a user’s reflection basically blended in with the background. As the basis for my initial approach, the distinction between hand and background in the thermal reflection, did not prove feasible, I searched for another approach. 7.4 Prefinal Approach As mentioned in the previous section, distinction between a hand and the background was only clear while the hand was in motion. This led me to the idea of making the gesture input rely solely on the hand’s actual movement, without consideration of its shape. However, there was a problem with this approach as well: Although movement of the hand in the thermal reflection can be detected on basis of changes in a pixel’s assigned value, as described in chapter Touch Input, this method of detection included any kind of movement within the scene, including head movements of the user and movement of the screen relative to the camera. To accommodate for this, I made the following assumptions: 1) The user is positioned within an optimum distance to the display and 2) Movement of the upper body is reduced to a minimum. An optimum distance refers to a distance that maximizes the area of the display within the camera’s field of view. With the equipment used in this work this referred to a distance of about half a meter or less. Under these assumptions, I constructed a primitive algorithm for gesture detection using hand movement. 48 7.4 Prefinal Approach Figure 7.1: Motion in the thermal reflection visualized 7.4.1 Modifying the thermal image Using the method described earlier to detect fluctuations in pixel values, I transformed the thermal image into a visualization of movement. Pixels that significantly changed values between frames were marked. Resulting in an image as depicted in 7.1. 7.4.2 Detecting the hand movement In this step, I ignored the middle area of the image, as I assumed this to be the area where a user’s upper body and head are located, whose movement is irrelevant to the gesture detection. Focusing on the outer areas on the right and left of the image allowed an easy detection of movement from a user’s hands and arms. Within the specified areas, highlighted clusters of pixels (the movement) are compared between frames. Their most northern point serves as a point of orientation. Changes in this point’s coordinates directly correlate to a respective movement of a user’s hand. As to not mix up actual input and idle movements, I put up a minimum threshold in coordinate changes that was necessary to implicate movement intended as input. They way this algorithm works is depicted in ??. No further refinements to this algorithm were made during the work’s allotted time frame, rendering its actual viability in real life scenarios questionable. 49 7 Gesture Input (a) (b) Figure 7.2: Relative movement of the pixel cluster’s northern extreme point indicates gesture input 7.5 Further considerations The approaches discussed so far do not deal with the issue of how to mark the beginning and end of an interaction between user and display. The beginning of a touch interaction is marked by the appearance of a heat trace on the display. Its end is open and no further touch input leads to no further reaction from the display. With gesture input based on movement, this is not as simple. Movement in the user’s surroundings, e.g. other passersby have to be taken into account, as it does not implicate input but rather noise that needs to be filtered. Otherwise, an interaction might start against the will of the current user. Considering this, further refinement of the prefinal approach is necessary, possibly including detecting of hands using more advanced algorithms and hardware. 50 8 User Study To evaluate the accuracy of the touch based prototype, I conducted a user study in a controlled lab setting. 5 participants between the ages of 22 and 28 (4 males, 1 females) took part in the study. In the following chapter I will discuss its setup, procedure and findings. 8.1 Setup I constructed the prototype gear by attaching the thermal camera to a spectacle frame. A long pair of cables connects the camera and the Raspberry Pi. The Pi itself is fixed inside a metal case modified for being worn by a user. The complete gear is shown in 8.1. The display the participants interacted with is a 48x27cm LC display with proportions with a resolution of 1360x768pixels, which was modified by fixating a slab of plexiglass a few centimeters in front of it. The slab covers the entirety of the screen and measures 50x30cm. In consequence, the outer areas of the plexiglass can be interacted with but should not trigger a response from the system. When mapping from the camera grid to the screen grid, this no-response-area has to betaken into account. The described setup was installed at approximately eye level and is shown in 8.2. 8.2 Procedure The study was divided into two parts, a practical part and a theoretical part. During the first part the participants were asked to perform a series of simple touch based operations. The procedure was designed as follows: As soon as a participant gave the signal, that they were ready, the tasks commenced. They screen would display a monotonous light blue background with a single red circle on it, the participants were supposed to touch. After confirmation of a successful touch event at the position of the circle, the circle would disappear and a circle in another location on the screen would appear. This scenario was repeated a set number of times. Three different types of circles varying in size were displayed: Small, medium and 51 8 User Study Figure 8.1: The prototype gear large circles. Their diameters (dependent on the display used in the study) were as follows: Small: 50 pixels, medium: 80 pixels, large: 110 pixels. Furthermore, each circles center was marked by a small white cross. Participants were asked to touch the circles as close to their centers as possible. A touch event was deemed successful if the detected circle of contact’s center lay within the circle’s edge. Each kind of circle was displayed the same amount of times. For each circle, the following information was documented: Its distance to the last point (preset, measured from outer edge to edge), the number of attempts until a successful touch event occurred and the distance between the circle’s center and the point of touch. The sequence of circles did not change in between participants and at every point in time during the study participants had the choice of aborting the experiment. 8.3 shows one of the participants during interaction with the experimental setup using the prototype gear. During the second part of the study the participants were asked to fill out a questionnaire. The first part of the questions asked about the participants’ prior experiences with public displays and touch displays. The second part contained questions regarding the setup used in the study, including the option of naming possible improvements to the prototype. Answering the questions happened voluntarily and could be aborted at any time. For an overview of the questions asked please refer to the Additional material. 52 8.3 Evaluation Figure 8.2: The setup used during the study 8.3 Evaluation 8.1 shows the measured data for each distinct circle. Size of the circle seemed to have no direct correlation to the algorithm’s accuracy, as well as the number of attempts needed for a successful touch interaction. This however seems strange, as the accuracy of the input should directly correlate to its success rate regarding accepted input. This leads to two possible conclusions: Either there are faults with the way the algorithm calculates the position of heat traces, or the measurements taken during the study are faulty. Despite that, interaction seemed probable, even though it is far from perfect as of now. Evaluation of the questionnaire led to the following results: Although initially, only three out of five participants could relate to the term "public display", in retrospect all participants declared they had prior experiences with public displays. It became evident that slight confusion regarding the term stemmed from a 53 8 User Study Figure 8.3: A user interacting with the display using the prototype non-consideration of non-interactive displays as public displays. Locations, in which the participants encountered public displays encompassed the expected candidates, such as schools/universities, pedestrial areas, public transit,train stations, air ports and shopping centers. 60 % of participants hard engaged interaction with the displays as well. In the experienced cases, interaction modalities were restricted to touch and gesture based input. All participants had prior experiences with touch displays, and only in one case the experiences were restricted to personal devices. Other touch devices included public displays navigational devices, although the latter could be considered a personal device as well. In the second part of the questionnaire, comfort of use, accuracy and response time regaring the introduced prototype were estimated on a five point Likert scale. With average scores of 3.8, 3.8 and 2.8 respectively, all three aspects appear to be of average satisfaction to the participants, requirering further improvements, especially regarding accuracy and response time. (By accident the scale on the questionnaire was inverted. 54 8.3 Evaluation Size of Distance to Inaccuracy # of attemps circle last circle in pixels # of circle in pixels highest lowest average highest lowest average 1 large — 22 5 12 1 1 1 2 medium 1041 29 7 20 2 1 1 3 small 310 23 11 17 1 1 1 4 large 732 24 15 19 1 1 1 5 small 891 17 3 13 3 1 2 6 large 882 14 7 11 1 1 1 7 medium 282 15 4 8 1 1 1 8 small 473 16 8 12 1 1 1 9 small 152 12 4 7 2 2 2 10 small 507 15 6 9 2 1 1 11 large 703 17 7 12 2 1 1 12 medium 297 13 5 8 1 1 1 13 medium 533 20 11 14 2 1 1 14 small 545 17 7 11 3 2 2 15 large 754 18 5 13 2 1 1 16 large 825 16 15 15 1 1 1 17 large 288 19 11 13 1 1 1 18 small 337 12 3 9 2 1 2 19 medium 86 14 7 11 1 1 1 20 medium 1065 17 9 14 2 1 1 21 medium 484 16 7 12 1 1 1 Table 8.1: Measured results for each circle The numbers I just stated are based on a scale inverted to the one found in the appendix.) Three participants were not sure whether they wanted to use a comparable device again in the future, one was positive and the last one negative in that regard. Named aspects considered in need of improvement were: Reduction of "cable spaghetti", improving portability by replacing the Raspberry Pi with a smaller device fulfilling its functions and improving response time. 55 9 Conclusion & Outlook This work proposes the use of wearable thermal cameras as a means to enable interaction with otherwise non-interactive public displays. As stationary cameras installed in public spaces encroach on passersby privacy and ordinary RGB cameras do not provide the flexibility regarding possible input modalities that thermal cameras do, this approach was deemed a promising approach to work on. A prototype based on touch interaction using heat traces left on a display by a user’s fingertips was implemented. Its feasibility was tested during a user study and came to the conclusion that although fullfilling the minimum requirements for acceptable touch interaction, further imporvements are necessary for an optimum user experience. During the work on this project, several limitations of the proposed approach, as well as starting points for future work became evident. 9.1 Limitations There are several factors which limit the described prototype’s usefulness in a real life application: First off, the prototype’s reliance on the detection of heat traces to assess touch input restricts it in several ways which do not apply to conventional touch displays. These restrictions largely stem from the fact that to register a heat trace, a clear line of sight between camera lens and display is required. Inherently, this leads to a higher response time, as input is not registered upon touch itself but rather upon the point of contact’s lingering heat becoming visible in the thermal image. This requires the user to move their fingers out of the way, an action which in itself represents an additional latency factor during interaction. Furthermore, depending on the specific use case, a user’s hand obstructing the camera’s field of view is a natural and frequent occurrence. One just has to envision interaction with an on-screen keyboard. Finger movements from lower rows of keys to upper ones naturally lead to the obstruction of lower keys, depending on the input sequence, potentially leading to inputs not being registered. To counteract this, a user would need to adapt their usual way of typing, further decreasing the speed of interaction. Another potential solution to this issue might be found by designing a display’s interface specifically with interaction using a wearable device in mind. Right now, this approach’s feasibility cannot be judged reliably, as the necessary 57 9 Conclusion & Outlook data is currently non-existent. Additionally, obstructions may occur in other ways as well. Assuming the wearable device consists of a pair of glasses featuring a camera, as used in this work, a user’s own hair for example is another potential obstruction. Under windy weather conditions, a user’s hair might obstruct the camera’s lens, rendering the device non-functional. However, this issue would be mostly restricted to outdoor public displays and can be counteracted by a user’s own awareness regarding their wearable device. The other main factor directly relating to the prototype’s accuracy is its ability to precisely register the display’s position. Accuracy is limited by the software and hardware used. The software, more precisely the implementation of the algorithms used to detect the screen, has to be adapted to the available hardware. A camera with a higher resolution might detect a screen more reliable than one with a lower resolution. Further work with higher resolution cameras is necessary to judge the hardware’s impact on the device’s accuracy. Next, in this work, the display in its entirety lying within the camera’s field of view is considered the default case, providing another restriction to the possible use cases. Using a camera with a 51° horizontal field of view and 63.5° diagonal field of view and a 21" display resulted in a minimum distance between screen and camera of about half a meter. This requirement negatively impacts the prototype’s feasibility in several aspects. First off, the user’s freedom in choosing their preferred distance during interaction is greatly reduced. This might also limit the range of input data suitable to a public display. At close range a user’s own body may serve as a physical layer of protection against a third party’s insights into the user’s private data by blocking the view of people behind them. If they need to step further away from the display to accurately input their data, this is no longer an option. Looked at from another angle, this requirement also hinders support for a wide spectrum of public displays. Depending on the camera’s field of view, it might be straight up impossible to interact with larger displays, as the camera would not be able to capture the entire screen while staying within the range required for touch input. To enable support for large displays, refinements to the software and hardware are necessary. Currently I cannot specify meaningful limitations regarding a gesture based system, as this would require a fully functional prototype and evaluation of its viability. 9.2 Outlook The results of this work provide numerous starting points for future work. As imple- mentation of a prototype for gesture based interaction could not be completed during the allotted time frame, future work may pick up at that point and finish a functional prototype. It may then be tested regarding the feasibility of motion based input versus 58 9.2 Outlook finger gestures and compared to conventional systems incorporating gesture input. Other starting points are based on the limitations and shortcomings of the finished prototype listed in the previous section. Future work may focus on how to maximize accuracy as well as on how to minimize response time. Another improvement to this work’s prototype lies with enabling support for multi-touch gestures, e.g. swipe gestures known from smartphones and the like. Probably one of the most important and challenging starting points for future work lies with improving aspects of screen detection and tracking. As there is currently no set standard for public displays regarding their size and shape, enabling support for touch based interaction for a wide spectrum of them using a single wearable device would be a considerable improvement over the current prototype. This would require an emphasis on enabling detection of a public display not relying on monitoring its edges at all times. Only partially visible areas of a display have to be recognized as such and need to be mapped to the display’s overall size, position and orientation. It is likely that approaches trying to accomplish exactly this will need to evaluate a camera’s distance and angle towards a display, requiring hardware in addition to a single thermal camera. As new technologies become affordable to everyday users, comparing their feasibility to that of the proposed approaches becomes necessary as well. All in all, work on the implementation of a prototype for enabling interaction with otherwise non-interactive public displays illustrated numerous limitations. While being obstacles on the one hand, they also provide valuable opportunities for future work. 59 A Additional material A.1 Questionnaire A.1.1 English questionnaire A.1.2 German questionnaire 61 A Additional material Bachelor Thesis „ Thermal Imaging for Public Displays“ User Study Conductor: Alexander Frank 2787130 Questionnaire Personal Data: Last name, first name: _________________________________________ Age: _________________________________________ Sex: [ ] Male [ ]Female General Questions: 1) Did you have any experience with public displays prior to this study? [ ]Yes [ ] No [ ]What is a public display? If answer 3 was chosen, skip questions 2-4 and inform the study's conductor after having filled out the questionnaire. 2) Where did you encounter the public display(s)? ____________________________________________________________ 3) Was/were the display(s) interactive? [ ]Yes [ ]No [ ]Some of them. [ ]I don't know. If answers 2 or 4 were chosen, skip the next question. 4) How did interaction take place? ____________________________________________________________ 5) Do you have any previous experiences with touch displays? [ ]Yes [ ]No 6) Is this experience restricted to personal devices (smartphone, tablet, etc)? [ ]Yes [ ]No If Yes was chosen, skip the next question 7) Name as many touch devices you had experience with in the past as you want. ____________________________________________________________ ____________________________________________________________ 62 A.1 Questionnaire Questions regarding the study's setup: 8) How would you rate the comfort of the wearable device? 1 2 3 4 5 very good alright very poor 9) Could you imagine using such a device again in the future? If not, why? [ ]Yes [ ]Not sure. [ ]No: ________________________________________________________ ________________________________________________________ 10) How would you rate the prototype's response time? 1 2 3 4 5 very fast alright very slow 11) How would you rate the prototype's accuracy? 1 2 3 4 5 very good alright very poor 12) Are there any aspects of the prototype you would like to see improved? If yes, which? [ ]No. [ ]Yes: ____________________________ 13) Please note any further remarks you might have concerning the prototype. ____________________________________________________________ ____________________________________________________________ ____________________________________________________________ Thank you very much for your participation in this user study! 63 A Additional material Bachelorarbeit „ Thermal Imaging for Public Displays“ Benutzerstudie Leiter: Alexander Frank 2787130 Fragebogen Persönliche Daten: Nachname, Vorname: _________________________________________ Alter: _________________________________________ Geschlecht: [ ] Männlich [ ]Weiblich Allgemeine Fragen: 1) Hatten Sie vor dieser Studie bereits Erfahrungen mit Public Displays gemacht? [ ]Ja [ ]Nein [ ]Was ist ein Public Display? Falls Sie Antwort 3 gewählt haben, überspringen Sie Frage 2- 4 und benachrichtigen Sie bitte den Leiter der Studie, nachdem Sie diesen Fragebogen ausgefüllt haben. 2) Wo begegneten Sie dem/den Public Display(s)? ____________________________________________________________ 3) War(en) das/die Display(s) interaktiv? [ ]Ja [ ]Nein [ ]Manche. [ ]Ich weiß nicht. Falls Antwort 2 oder 4 gewählt wurde, überspringen Sie die nächste Frage. 4) Wie gestaltete sich die Interaktion? ____________________________________________________________ 5) Haben Sie vorangehende Erfahrungen mit Touch Displays? [ ]Ja [ ]Nein 6) Sind diese Erfahrungen auf Privatgeräte beschränkt (Smartphone, Tablet)? [ ]Ja [ ]Nein Falls Ja gewählt wurde, überspringen Sie die nächste Frage. 7) Nennen Sie so viele Geräte mit Touch Eingabe, mit denen Sie in der Vergangenheit Erfahrungen gemacht haben, wie Sie möchten. ____________________________________________________________ ____________________________________________________________ 64 A.1 Questionnaire Fragen bezüglich dem Aufbau aus der Studie: 8) Wie würden Sie den Komfort des tragbaren Geräts bewerten? 1 2 3 4 5 sehr gut ok sehr schlecht 9) Könnten Sie sich vorstellen, ein solches Gerät in Zukunft erneut zu verwenden? Falls nein, wieso? [ ]Ja [ ]Nicht sicher. [ ]Nein: ________________________________________________________ ________________________________________________________ 10) Wie würden Sie die Antwortzeit des Prototypen bewerten? 1 2 3 4 5 sehr schnell ok sehr langsam 11) Wie würden Sie die Genauigkeit des Prototypen bewerten? 1 2 3 4 5 sehr gut ok sehr schlecht 12) Gibt es Aspekte des Prototypen, die Sie gerne verbessert sehen würden? Falls ja, welche? [ ]Nein. [ ]Ja: ____________________________ 13) Bitte notieren Sie weitere Anmerkungen, die Sie bezüglich des Prototypen haben. ____________________________________________________________ ____________________________________________________________ ____________________________________________________________ Vielen Dank für ihre Teilnahme an dieser Benutzerstudie! 65 Bibliography [ASS+12] F. Alt, S. Schneegaß, A. Schmidt, J. Müller, N. Memarovic. “How to evaluate public displays.” In: Proceedings of the 2012 International Symposium on Pervasive Displays. ACM. 2012, p. 17 (cit. on p. 24). [EBBF04] M. Eaddy, G. Blasko, J. Babcock, S. Feiner. “My own private kiosk: privacy- preserving public displays.” In: Proc. Eighth Int. Symp. Wearable Computers. Vol. 1. Oct. 2004, pp. 132–135. DOI: 10.1109/ISWC.2004.32 (cit. on p. 24). [KFF+09] N. Kaviani, M. Finke, S. Fels, R. Lea, H. Wang. “What Goes Where?: De- signing Interactive Large Public Display Applications for Mobile Device Interaction.” In: Proceedings of the First International Conference on In- ternet Multimedia Computing and Service. ICIMCS ’09. Kunming, Yunnan, China: ACM, 2009, pp. 129–138. ISBN: 978-1-60558-840-7. DOI: 10.1145/ 1734605.1734637. URL: http://doi.acm.org/10.1145/1734605.1734637 (cit. on p. 24). [KK05] O.-K. Kwon, S. G. Kong. “Multiscale fusion of visual and thermal images for robust face recognition.” In: Proc. IEEE Int. Conf. Computational Intelligence for Homeland Security and Personal Safety CIHSPS 2005. Mar. 2005, pp. 112– 116. DOI: 10.1109/CIHSPS.2005.1500623 (cit. on p. 23). [LCG+11] E. Larson, G. Cohn, S. Gupta, X. Ren, B. Harrison, D. Fox, S. Patel. “Heat- Wave: Thermal Imaging for Surface User Interaction.” In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. CHI ’11. Vancouver, BC, Canada: ACM, 2011, pp. 2565–2574. ISBN: 978-1-4503- 0228-9. DOI: 10.1145/1978942.1979317. URL: http://doi.acm.org/10. 1145/1978942.1979317 (cit. on p. 23). [OKB15] M. Ostkamp, C. Kray, G. Bauer. “Towards a Privacy Threat Model for Public Displays.” In: Proceedings of the 7th ACM SIGCHI Symposium on Engineering Interactive Computing Systems. EICS ’15. Duisburg, Germany: ACM, 2015, pp. 286–291. ISBN: 978-1-4503-3646-8. DOI: 10.1145/2774225.2775072. URL: http://doi.acm.org/10.1145/2774225.2775072 (cit. on p. 24). 67 [PR15] K. Palovuori, I. Rakkolainen. “The Heat is on: Thermal Input for Immaterial Interaction.” In: Proceedings of the 19th International Academic Mindtrek Conference. AcademicMindTrek ’15. Tampere, Finland: ACM, 2015, pp. 152– 154. ISBN: 978-1-4503-3948-3. DOI: 10.1145/2818187.2818272. URL: http: //doi.acm.org/10.1145/2818187.2818272 (cit. on p. 23). [SAH+14] A. Sahami Shirazi, Y. Abdelrahman, N. Henze, S. Schneegass, M. Khalil- beigi, A. Schmidt. “Exploiting Thermal Reflection for Interactive Systems.” In: Proceedings of the SIGCHI Conference on Human Factors in Comput- ing Systems. CHI ’14. Toronto, Ontario, Canada: ACM, 2014, pp. 3483– 3492. ISBN: 978-1-4503-2473-1. DOI: 10.1145/2556288.2557208. URL: http://doi.acm.org/10.1145/2556288.2557208 (cit. on pp. 21, 23). [Sch15] S. Schneegass. “There is More to Interaction with Public Displays Than Kinect: Using Wearables to Interact with Public Displays.” In: Proceedings of the 4th International Symposium on Pervasive Displays. PerDis ’15. Saar- bruecken, Germany: ACM, 2015, pp. 243–244. ISBN: 978-1-4503-3608-6. DOI: 10.1145/2757710.2776803. URL: http://doi.acm.org/10.1145/ 2757710.2776803 (cit. on p. 24). [SLP12] E. N. Saba, E. C. Larson, S. N. Patel. “Dante vision: In-air and touch gesture sensing for natural surface interaction with combined depth and thermal cameras.” In: Proc. IEEE Int. Conf. Emerging Signal Processing Applications. Jan. 2012, pp. 167–170. DOI: 10.1109/ESPA.2012.6152472 (cit. on p. 23). [SWZ15] X. Shi, S. Wang, Y. Zhu. “Expression Recognition from Visible Images with the Help of Thermal Images.” In: Proceedings of the 5th ACM on International Conference on Multimedia Retrieval. ICMR ’15. Shanghai, China: ACM, 2015, pp. 563–566. ISBN: 978-1-4503-3274-3. DOI: 10.1145/2671188.2749355. URL: http://doi.acm.org/10.1145/2671188.2749355 (cit. on p. 23). [TGC06] P. Tarasewich, J. Gong, R. Conlan. “Protecting Private Data in Public.” In: CHI ’06 Extended Abstracts on Human Factors in Computing Systems. CHI EA ’06. Montréal, Québec, Canada: ACM, 2006, pp. 1409– 1414. ISBN: 1-59593-298-4. DOI: 10.1145/1125451.1125711. URL: http: //doi.acm.org/10.1145/1125451.1125711 (cit. on p. 24). All links were last followed on December 19, 2016. Declaration I hereby declare that the work presented in this thesis is entirely my own and that I did not use any other sources and references than the listed ones. I have marked all direct or indirect statements from other sources con- tained therein as quotations. Neither this work nor significant parts of it were part of another examination procedure. I have not published this work in whole or in part before. The electronic copy is consistent with all submitted copies. place, date, signature