Institute for Visualization and Interactive Systems
University of Stuttgart
Universitätsstraße 38
D–70569 Stuttgart
Bachelorarbeit Nr. 351
Thermal Imaging for Interactive
Public Displays
Alexander Frank
Course of Study: Informatik
Examiner: Prof. Dr. Albrecht Schmidt
Supervisor: Yomna Abdelrahman, M.Sc.
Stefan Schneegaß, M.Sc.
Commenced: June 20, 2016
Completed: December 20, 2016
CR-Classification: I.4.1, I.4.3, I.4.6, I.4.8

Kurzfassung
Wenn man heutzutage in der Öffentlichkeit unterwegs ist, beispielsweise in Fußgänger-
zonen, Bahnhöfen oder Einkaufszentren, begegnet man vielerorts Public Displays. Sie
versorgen Passanten mit potentiell relevanten Informationen hinsichtlich ihrer aktuellen
Lage und Bedürfnisse, beispielsweise Sehenswürdigkeiten oder Bahnfahrplänen. Jedoch
hat der Passant nicht immer die Möglichkeit, den dargestellten Informationsgehalt den
eigenen Interessen anzupassen. Diese nicht interaktiven Displays aktualisieren ihre
dargestellten Inhalte, falls überhaupt, als Reaktion auf interne Signale, über die der
Passant keine Kontrolle hat. Aktualisierungen erfolgen beispielsweise in festgelegten
Zeitintervallen oder bei Ankunft beziehungsweise Abfahrt eines Zuges. Interaktive Public
Displays, andererseits, bieten Passanten die Möglichkeit, die dargestellten Inhalte aktiv
zu manipulieren, um ihren aktuellen Interessen gerecht zu werden. Es lassen sich etwa
zusätzliche Informationen über Sehenswürdigkeiten oder Angebote eines Einkaufszen-
trums anzeigen. Falls ein Public Display nicht von Natur aus Interaktion, beispielsweise
per Berührung oder Gestensteuerung, unterstützt, erfordert die nachträgliche Imple-
mentierung solcher Interaktionstechniken entweder komplett neue Hardware, die das
alte Display ersetzt, oder zusätzliche Hardware, wie etwa Kameras. Das Ergänzen durch
neue Hardware gestaltet sich jedoch herausfordernd, wenn gleichzeitig Ansprüche an
öffentliche Interfaces beachtet werden sollen. In dieser Arbeit werden Möglichkeiten,
sonst nicht-interaktive Displays interaktiv zu gestalten und hierbei die Privatsphäre des
Nutzers zu schützen, diskutiert und ein Prototyp auf Basis von Thermographie wird
entwickelt.
Abstract
Nowadays when moving in public spaces such as pedestrian areas, train stations or
shopping centers, people can oftentimes encounter public displays. They provide
passersby with information potentially relevant to their current situation and needs,
e.g. places of interest or train schedules. However, not always does a passerby find a
way to tailor the information content displayed to his or her current needs. These non-
interactive displays update their displayed contents, if at all, based on internal signals
the passerby has no control over. Updates may occur in set intervals of time or upon a
train’s arrival and departure, for example. Interactive public displays on the other hand
offer passersby the option to actively manipulate the displayed contents in a way they see
fit to their current needs. They can, for example, gather more information on a certain
place of interest or a shopping center’s offerings. If a public display’s hardware does
not inherently support interaction, for example via touch or gestures, retrospectively
3
implementing such interaction techniques requires either all new hardware replacing
the old display, or additional hardware such as cameras. Adding new hardware however
proves challenging when simultaneously trying to meet the requirements posed upon
public interfaces. In this work, possible ways to enable interaction for otherwise non-
interactive public displays, while keeping privacy concerns in mind, will be discussed
and a prototype using thermal vision is developed.
4
Contents
1 Introduction 11
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2 Aim of this Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2 Interaction in Public Space & Public Displays 13
2.1 Public Displays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Implications on this Work . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3 Thermal Vision 19
3.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Thermal Imaging for Touch Based Interaction . . . . . . . . . . . . . . . 19
3.3 Thermal Imaging for Gesture Based Interaction . . . . . . . . . . . . . . 20
4 Related Work 23
4.1 Thermal imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.2 Public Displays & Public Space . . . . . . . . . . . . . . . . . . . . . . . 24
5 Hardware & Software 25
5.1 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.2 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
6 Touch Input 29
6.1 Initial Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
6.2 Acquisition of the Thermal Image . . . . . . . . . . . . . . . . . . . . . . 30
6.3 Detection of Heat Traces . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
6.4 Detection of the Screen Area . . . . . . . . . . . . . . . . . . . . . . . . . 37
6.5 Calculating the Point of Touch’s Relative Position on the Screen . . . . . 44
7 Gesture Input 47
7.1 Initial Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
7.2 Based on the Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . 48
7.3 Based on the camera’s raw output data . . . . . . . . . . . . . . . . . . . 48
7.4 Prefinal Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
7.5 Further considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5
8 User Study 51
8.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
8.2 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
8.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
9 Conclusion & Outlook 57
9.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
9.2 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
A Additional material 61
A.1 Questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Bibliography 67
6
List of Figures
3.1 Two different visualizations of the same scene gained using a thermal
camera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2 Field of view and reflection using an RGB camera [SAH+14] . . . . . . 21
3.3 Field of view and reflection using a thermal camera [SAH+14] . . . . . 21
5.1 The thermal camera used in this project . . . . . . . . . . . . . . . . . . 25
5.2 The Raspberry Pi used to process the camera’s output data . . . . . . . . 26
6.1 Impact of varying temperature intervals on the visualization of the scene 31
6.2 Thermal Image of a display featuring a (barely visible) heat trace (inside
the white circle) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
6.3 Impact of the camera’s internal temperature on the output with radio-
metric mode enabled . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
6.4 Impact of the camera’s internal temperature on the output with radio-
metric mode disabled . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
6.5 Filtering the heat traces (and some noise) . . . . . . . . . . . . . . . . . 37
6.6 Detection of the screen area based on its heat emission . . . . . . . . . . 39
6.7 Intermediate results of the different processing steps applied to the ther-
mal image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
6.8 Only parts of the screen may lie within the camera’s field of view . . . . 44
6.9 Transformation of the touch point’s coordinates from the camera’s global
grid to the screen’s local grid . . . . . . . . . . . . . . . . . . . . . . . . 45
7.1 Motion in the thermal reflection visualized . . . . . . . . . . . . . . . . . 49
7.2 Relative movement of the pixel cluster’s northern extreme point indicates
gesture input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
8.1 The prototype gear . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
8.2 The setup used during the study . . . . . . . . . . . . . . . . . . . . . . . 53
8.3 A user interacting with the display using the prototype . . . . . . . . . . 54
7

List of Tables
8.1 Measured results for each circle . . . . . . . . . . . . . . . . . . . . . . . 55
9

1 Introduction
1.1 Motivation
In many places, public displays found their way into public space. They all share
their purpose of displaying potentially valuable information to passersby, with the
exact content varying with the location of installation. In pedestrian areas they offer
information about a city’s places of interest, in train stations they show the schedule
of arriving and departing trains and in shopping centers they offer information on the
layout and offerings, to name a few possible applications. Although they share a common
purpose, they vary not only in shape and size, from TV-sized rectangular displays to
displays the size of advertising pillars, but also in their possible ways of interacting with
passersby. Some displays may serve as billboards displaying advertisements and loop
through a series of predetermined adverts, allowing for no interaction with a passerby.
Others may update their displayed content based on certain environmental factors in
their surroundings, e.g. a public display in a train station may receive an update upon
arrival and departure of trains or to inform passengers about disruptions in the railroad
traffic. Information displayed this way is more dynamic and time-sensitive than bare
advertisements, however, passersby still have no way of interacting with the display to
manipulate its information content.
Interactive public displays on the other hand offer such possibilities in different ways:
Users may touch the display to select items from a menu, get more detailed information
on them or enter a search term into a search engine. No keyboard and/or mouse
are required and passersby may start interaction freely on their own accord. Another
common way of interaction is gesture input: A passerby positions themselves in front
of the display and performs gestures in mid-air using their hands or other body parts.
The display in turn responds by updating its contents appropriately, e.g. navigating
through a gallery of images based on the passerby’s hand motion to the right or left. One
might try to enable such interaction techniques for otherwise non-interactive displays
as well, however this is in general not possible without additional and/or completely
new hardware and software. An ordinary TV screen, for example, that might serve as a
public display and does not inherently support input via touch, will not do so without
the necessary upgrades. The hardware either has to be replaced with a completely new
system which does support touch input, or additional hardware and software have to
11
1 Introduction
be installed, e.g. cameras monitoring the screen’s surface to register touch and their
respective drivers.
1.2 Aim of this Work
This work discusses how to enable interaction for otherwise non-interactive public
displays while using affordable additional hardware and keeping concerns unique to
interaction in public space in mind.
Structure
The rest of this work is organized as follows:
Chapter 2 – Interaction in Public Space & Public Displays: Aspects and characteris-
tics of public displays and interaction in public space are discussed, including their
implications on the hardware used.
Chapter 3 – Thermal Vision deals with the basics of thermal vision and its advantages
over RGB vision regarding this work.
Chapter 4 – Related Work: A selection of other publications dealing with partial as-
pects of this work.
Chapter 5 – Hardware & Software lists the hardware and software used in this
project.
Chapter 6 – Touch Input summarizes the prototype development for touch input, in-
cluding problems encountered.
Chapter 7 – Gesture Input summarizes the prototype development for gesture input,
including problems encountered.
Chapter 8 – User Study: A summary on the user study conducted to test the prototype’s
viabililty and its findings.
Chapter 9 – Conclusion & Outlook concludes this work by summarizing it and provid-
ing a possible outlook on future work.
12
2 Interaction in Public Space & Public
Displays
As already mentioned in this work’s introduction, public displays can be encountered in
different shapes and sizes, however they might not differ much from non-public displays
such as one’s TV- or PC-display in the way they work. The way they can be interacted
with theoretically does not differ much as well, but in practice, significant differences
become apparent. To understand these differences and apply the resulting insights to
an interactive interface, the environmental factors surrounding a public display have to
taken into account.
2.1 Public Displays
The most basic difference between an in-home display and a public display is likely the
latter’s location of installation and the resulting consequences on the requirements it has
to meet. Factors such as robustness to different kinds of weather and potential exposure
to vandalism aside, the display’s location heavily impacts the way passersby choose to
interact or not interact with it.
For further understanding, let us take a quick look at how touch and gesture interfaces
are implemented into public displays and how the way of implementation affects user
interaction:
2.1.1 Touch Input in Public Space
Touch input may be inherently supported by a public display’s hardware using different
technologies. No further distinction between different touchscreen technologies will
be made at this point, as the fundamental way of interaction does not change and
differences might not even be perceivable to the user. Notable differences however
are for example input precision, usability of a touch pen and support for multi-touch
gestures such as swiping gestures on the screen.
To a user, the basics of touch interaction do not change with technology (aside from use
13
2 Interaction in Public Space & Public Displays
of a touch pen or multi-touch gestures): The user touches an area on the screen, hereby
performing an action which would correlate to a mouse click in a desktop application
using keyboard and mouse. The mouse and its cursor are replaced by the user’s finger
and a keyboard can be replaced by an on-screen keyboard, if necessary. The result is an
easy and intuitive interaction technique that many users might already know from their
own handheld devices such as smartphone or tablet PC.
If the hardware does not support touch input, additional hardware is necessary which
can register the user’s touch. A possible solution is the installation of one or several
cameras in the display’s proximity, for example inside the frame encasing the display.
These cameras can be used to monitor the display’s surface and notify the underlying
system upon registering a passerby’s fingertip touching the display. Interaction then
works the same way as with other technologies although possibly less accurate and fast.
Although touch is widely accepted as a means of input for personal devices, proven by
the widespread of smartphones and tablet PCs, there are factors unique to public spaces
that can potentially reduce a person’s willingness to interact with public displays using
the same technology.
A first such factor, which will not be further discussed in detail as it is occurs inde-
pendently of implementation, is hygiene. People might be uncomfortable touching a
display which hundreds, maybe thousands of strangers touched before them. But, as
mentioned, this issue is not solvable by means of using different implementations and
can furthermore be applied to all kinds of public objects, such as door handles and grab
handles in buses.
Another problem factor, which also does not directly tie to the means of implementation
but public touch displays in general, is the display’s location in combination with the
required input. If a touch display requires sensitive data as input, e.g. a user’s bank
information, the display’s location and the design of its touch interface, including the
positioning of buttons and input fields is of utmost importance. When processing such
highly sensitive data, the interface should be designed in a way which prohibits third
persons from gaining insight into said data by merely standing near an interacting user
and watching the display.
The third and last concern discussed at this point regarding touch displays, comes into
play when gathering touch input data using cameras. Users noticing a camera installed
near a display they are supposed to interact with might hesitate, as the mere presence of
a camera might induce a feeling of being watched or even spied upon, thus resulting
in a feeling of the user’s privacy being breached. The severity of this concern strongly
relates to the size and positioning, thus the visibility of the cameras used. If the user
cannot notice the cameras, they are less likely to feel watched. Furthermore, as the
most widespread devices using touch input, namely smartphones and tablet PCs, do not
rely on cameras to register the input, users might not even consider the possibility of
cameras being nearby.
14
2.1 Public Displays
2.1.2 Gesture Input in Public Space
For a public display to register gesture input the user’s gestures and/or motion needs to
be tracked, therefore a camera is necessary. The camera needs a clear view of the user
interacting with the display and is thus installed near the display, e.g. within the frame
encasing it. Upon performing certain gestures and hand motions being performed by
the user, the display selects items, shifts views etc.
Depending on the gestures used, interaction with the public display shapes more or less
intuitive. Navigating trough a series of images may intuitively be achieved by moving
one’s hand to the right or left, thus "shoving aside" the current view. Selection of an
item however might not be as intuitive. Does the item need to be "grabbed", "pushed" or
"pulled"? User’s might therefore get frustrated if the display does not react in a way they
anticipated. This issued however can easily be circumvented by familiarizing the user
with the display’s input modalities, for example by featuring a short manual on the side
of the screen or on it or even a short demonstrative walkthrough. At the same time this
might reduce a passerby’s willingness to interact even more if they do not want to make
an effort familiarizing with the display.
Another problem factor results directly from the fact that interacting is based on a
passerby’s movement and gestures. While movements and gestures are greatly appre-
ciated as a form of interaction with home entertainment systems, judging from the
popularity of systems such as Nintendo’s Wii consoles 1 and Microsoft’s Kinect 2, the
concept is not easily converted into public space. Whereas a user might have no problem
performing gestures to interact with their system of choice at home, they might feel
embarrassed performing the same or similar gestures in public, thus in front of a crowd
of strangers.
Additionally, interacting with a public display bears the risk of failure, either due to
faulty interaction on behalf of the user or missing reaction from the display, being poten-
tially visible to everybody in the user’s proximity, further nurturing the user’s feeling of
embarrassment. Not even considering the potential for failure, a passerby might simply
not want to become the center of attention by suddenly stopping in front of a display
and performing gesture usually unnatural to their behaviour. On the other hand, this
"unnatural" behaviour might encourage other passersby to engage in interaction with
the display out of curiosity.
The third and last concern links to the need of cameras for gesture recognition. As with
cameras used for touch input, the use of cameras for gesture detection rises concerns
regarding the user’s privacy. Due to the camera being installed in public space and
thus "always being there", passersby might feel being watched even when not inter-
1https://www.nintendo.de/Wii/Wii-94559.html
2https://developer.microsoft.com/de-de/windows/kinect
15
2 Interaction in Public Space & Public Displays
acting with the display and just passing by. A passerby does not know when or what
or who the camera is currently watching. Again, passersby have to potentially fear a
breach of their privacy as the identity and intentions of the people having installed the
camera cannot be fully assured at all times. A third party might have hacked into the
system and use it to spy on passersby. With such potential thoughts in mind, passersby
might actively avoid public displays as soon as they notice a camera. Additionally,
unlike touch displays, the use of a camera for gesture input is intuitively obvious, as
someone or something has to watch the user to pick up on their gestures and movements.
2.2 Implications on this Work
After having considered the environmental factors and requirements unique to inter-
action in public space, approaches for enabling interactivity for non-interactive public
displays can be considered.
As we do not want to upgrade the display’s hardware itself, interaction has to be enabled
using solely additional hardware and software. With gesture recognition inherently rely-
ing on input filmed by a camera and touch recognition being realizable using cameras
as well, the use of a camera as additional hardware seems obvious. However, due to the
privacy concerns regarding the use of cameras in public space which were just discussed,
refinements to the initial approach are necessary.
Since the installation of a camera in the display’s vicinity is not an option, another way
of using a camera needs to be found. An appropriate way would be to - instead of using
a stationary camera - use a camera a passerby can carry with them, thus giving them
control over when and where to use it. This would require the used camera to ideally
be as small and light as possible as to not obstruct the user during interaction with the
display and their other everyday activities.
This leads to the idea of using a wearable device, a personal device directly integrated
into the wearer’s clothes or accessories.
With cameras the size of a fingernail being available nowadays, using such a small
camera provides an ideal solution to our problem. The camera can for example be
installed within a pendant the users wears around their neck or within the frame of a
pair glasses akin to the Google Glass 3, thereby minimizing obstruction of the wearer’s
daily routines.
With the hardware to use being decided upon, a rough idea of how interaction is sup-
posed to take place can be realized, which leads to the conclusion that an ordinary RGB
camera does not pose an ideal solution due to several reasons.
3https://www.google.com/glass/start/
16
2.2 Implications on this Work
2.2.1 Limitations of RGB cameras
Although touch interaction can theoretically be enabled by tracking a user’s fingertips,
gesture interaction proves to be cumbersome regarding the ease of use for the user.
Naturally, the gestures used for interaction have to be performed within the camera’s
field of view.
Depending on the position of the camera on the user’s body, e.g. inside a pendant around
their neck or inside the frame of their glasses, the user’s available space for interaction is
greatly limited. All the gestures and movements have to be performed within the space
in front of the users upper body or head respectively. Performing a gesture next to one’s
body is not possible with a wearable RGB camera.
Additionally, depending on the camera’s field of view, gestures might have to be per-
formed a minimum distance away from the camera, further limiting the user’s space for
interaction by not only requiring their hands to stay in front of their body but also their
arms being stretched out to the front.
This issue is solved by using a thermal camera instead of an RGB camera. As to how the
use of thermal vision solves the issue of restricted space of interaction during gesture
input, please see the next chapter.
17

3 Thermal Vision
3.1 Basics
Thermal imaging creates images based upon the longwave infrared radiation emanated
by objects in a given scene. Every object, animal and person emanates infrared waves
(assuming their temperature lies above absolute zero), the warmer the object the higher
the amount of radiation. Therefore, conclusions on an object’s temperature can be drawn
from its amount of radiation. The thermal images produced this way give information
on the differences in temperature between different objects. Furthermore, thermal
imaging does not rely in illumination, it works in light environments as well as in
dark environments. This might for example be used for detection of creatures such as
animals and humans and their movement in a scene, as their body temperature usually
surpasses the temperature of the environment, thus making them well visible in the
thermal image. A real life application using thermal imaging this way is for example the
usage for surveillance cameras, as this technique works independently of the monitored
scene’s illumination. Other applications include firefighters being able to detect survivors
through smoke during fires, detection of heat leaks in buildings and medical applications.
Different ways of visualizing the radiation data are available, as shown in 3.1. The
attributes unique to thermal imaging that make it an attractive solution for interaction
with a public display are discussed in the following respective subsections.
3.2 Thermal Imaging for Touch Based Interaction
As just explained, every human emanates a certain amount of infrared radiation based
on their body temperature. When a person touches an object with different temperature
than themselves, part of their skin temperature is transferred to object, thus influencing
the object’s temperature at the point of contact. Upon breaking up contact, part of the
person’s skin temperature temporarily lingers on at the point of contact. In a thermal
image, the contrast between the temperature change at the point of contact and the
rest of the object’s surface becomes visible. Over time the temperature at the point of
contact returns to the normal temperature before the contact. Applying this concept to
19
3 Thermal Vision
(a) Ironblack Scale (b) Rainbow Scale
Figure 3.1: Two different visualizations of the same scene gained using a thermal
camera
a display, it is possible to detect heat traces in areas of the display a user just touched,
thus enabling detection of touch input.
3.3 Thermal Imaging for Gesture Based Interaction
At the end of the previous chapter the negative implications on the comfort of use
when using RGB cameras for gesture detection were explained. When using a thermal
camera instead, this issue is circumvented due to a phenomenon referred to as thermal
reflection: Not only do different objects and materials emanate different amount of
radiation, some materials, such as glass, instead completely reflect radiation. Therefore,
whereas these materials have no special properties when viewed through an RGB camera
(3.2), they behave towards thermal cameras as mirrors behave towards RGB cameras. If
a user with a wearable thermal camera stands in front of a display and films the display,
the area of the display in the thermal image does not show the contents displayed on it
but rather the reflection of the user. A visualization of this effect is shown in 3.3. The
thermal camera’s field of view thereby does not only include the area in front of its lens,
but also the areas beside and behind it, reflected in the surface of the display. These
areas added to the field of view serve as additional space for gestures and movement to
the user, thus eliminating an RGB camera’s shortcomings in that regard.
20
3.3 Thermal Imaging for Gesture Based Interaction
Figure 3.2: Field of view and reflection using an RGB camera [SAH+14]
Figure 3.3: Field of view and reflection using a thermal camera [SAH+14]
21

4 Related Work
As this work merges aspects of thermal imaging, interaction with public displays, as
well as interaction in public space in general, an immense amount of research and
its results related to it can be found. With technology used in thermal imaging and
interaction with public displays becoming more affordable in recent years, they amount
of research conducted in the respective fields increased as well. Likewise, an increase in
technologies supporting interaction in public space came with an appropriate amount of
research regarding the implications of these technologies on user’s privacy, as well as
other social factors surrounding it.
4.1 Thermal imaging
Use cases in military, medical, industrial and many other fields aside, thermal imaging
also has an immense impact on human-computer-interaction. A considerable amount of
attention within this field is focused on the use of thermal cameras to enable means of
interaction in otherwise non-interactive systems:
In [LCG+11], Larson et al. present their project named HeatWave. HeatWave uses
thermal cameras to transform an arbitary surface, such as a table top into an interactive
multi-user surface supporting input via touch and gestures performed on the surface by
tracking the residual heat traces left by the users’ fingers.
A similar project called Dante vision is presented in [SLP12]. Combining thermal
cameras and depth cameras, such as the Microsoft Kinect, creates a projection system
which allows support of touch and in-air gestures as input. Multiple users may perform
different tasks at the same time.
Palovuori and Rakkolainen use thermal imaging to create a system for touch based
interaction with an immaterial fog-screen [PR15].
In [SAH+14], Sahami Shirazi et al. exploit the thermal reflectivity of certain surface
materials to create an interactive system using the reflection of a user’s hand gestures as
input.
Other works in this field of research include the usage of thermal cameras for face
recognition [KK05] and expression recognition [SWZ15].
23
4 Related Work
4.2 Public Displays & Public Space
Research on public displays ranges from the proposal of general guidelines [ASS+12]
to the investigation of novel interaction techniques [Sch15] [KFF+09] to assessments
of potential privacy concerns [OKB15] [TGC06]. In [Sch15], Schneegass proposes the
usage of wearable devices for interaction with public displays, an approach similar to
this work’s aim. In [ASS+12] Alt et al. present a set of guidelines for evaluating public
displays, based on extensive literature research as well as own experiences reagrding
the subject. Several works dealing with ways of protecting a user’s data in public space,
e.g. from they eyes of third parties, such as [TGC06] and [EBBF04].
24
5 Hardware & Software
5.1 Hardware
The thermal camera used to gather data is a FLIR Lepton® longwave infrared (LWIR)
sensor, including a socket and breakout board to connect it to the processing hardware.
The Lepton® distinguishes itself by its compact size of 10.6 x 11.7 x 5.9 mm (including
the socket), rendering it smaller than a 1ct piece, thus being a convenient solution for
integration into mobile systems such as wearable devices. Furthermore it features a
resolution of 80x60 pixels and effective frame rate of 8,6Hz. It measures temperatures
ranging from -10°C up to 65°C and operates with a thermal sensitivity of 0.05 Kelvin. The
captured wavelengths lie within the spectral range of 8 µm to 14 µm. The lens provides
a 51° horizontal field of view and 63.5° diagonal field of view. 5.1a demonstrates the
compact size of the camera module itself, 5.1b shows the camera including socket and
breakout board.
(a) (b)
Figure 5.1: The thermal camera used in this project
25
5 Hardware & Software
Figure 5.2: The Raspberry Pi used to process the camera’s output data
Image processing functions and the actual touch/gesture recognition algorithms run
on a Raspberry Pi® 3 Model B (Initially an older model, the Raspberry Pi® 1 Model B,
was used. Due to performance issues a change to current hardware was conducted.).
It features a Quad Core 1.2 GHz processor, enabling real-time processing without
performance issues. A picture of the Raspberry Pi® used in this work is shown in 5.2.
Communication between the Lepton® and Raspberry Pi® takes place via the Raspberry
Pi®’s SPI and I2C ports, streaming video data to the computer and sending commands
to the camera respectively. Initially, the prototype was supposed to be tested on one of
the Institute for visualization and Interactive Systems’s public displays, however during
development changes to this plan were made, resulting in the prototype becoming
independent of the display used. More on this subject in chapter 6. More information
on the hardware used in this work, can be found on the respective websites 12
5.2 Software
The programming language of choice I used in this work is C++, using additional Qt and
OpenCV libraries for image capturing and processing respectively. The code used in this
work to read the Lepton®’s data and output it as a thermal image is openly available at
a GitGub repository 3. Although Qt is not required to process the Lepton®’s data, it used
in this work, as otherwise rewriting of the code for image capturing would have been
necessary, increasing the probability of errors occurring. Furthermore, usage of the Qt
and OpenCV libraries in tandem only requires a simple conversion between the libraries
1http://www.flir.com/cores/lepton/
2https://www.raspberrypi.org/
3https://github.com/groupgets/LeptonModule
26
5.2 Software
respective image formats. OpenCV is used to further process the Lepton®’s output image.
OpenCV distinguishes itself by a large spectrum of available image processing functions,
requiring little effort during implementation, as many functions are implemented as
single lines of code. For further information on QT, OpenCV and their different functions,
see their respective documentations 4 5
4http://doc.qt.io/qt-4.8/
5http://opencv.org/documentation/opencv-3-1.html
27

6 Touch Input
This chapter reports on the work conducted to implement a prototype detecting touch
input on public displays using thermal vision, problems encountered and solutions to
said problems.
6.1 Initial Approach
Before starting to work on the implementation, careful consideration of which steps are
necessary to enable meaningful touch-based interaction is required. The two leading
questions are:
Which area(s) in the thermal image correlate to the leftover heat of a user’s touch?
Which area(s) on the public display did the user touch?
Although the two questions sound similar, there is a very important difference between
them: Naturally, registering touch input requires detection of leftover heat traces from
a user’s touch. However, solely pinpointing the position of the relevant pixels in the
thermal image does not suffice. As the thermal image is likely to either not only picture
the surface the user is interacting with, but its surroundings as well and/or picture only
parts of the surface, it is necessary to transform the pixels coordinates in the global grid
within the camera’s field of view into coordinates in the screen’s local grid. To do this,
the screen’s relative position has to be taken into account as well. Consideration of said
questions results in the following four essential steps:
1. Get the thermal image from the camera
2. Identify the position of the point of touch (on the display)
3. Locate the location of the screen
4. Compute the position of the point of touch in relation to the screen area
The following subsections will go into more detail about the different iterations of code
that were created during the implementation and the potential problems associated
with them. Furthermore, different approaches to solutions to these problems and their
feasibility are discussed including the solutions that were finally deemed appropriate.
29
6 Touch Input
6.2 Acquisition of the Thermal Image
As already mentioned in chapter Hardware & Software the piece of code used to read
the thermal camera’s output data and converting it into a colorized image is available at
the respective repository. No substantial changes to the code’s functionality were made.
As the way the thermal camera outputs the pixel data and how this data is assigned a
visualization, a short explanation of the code’s functions follows:
Each frame consists of an one-dimensioal array storing 4920 14-bit values, one such
value for each pixel. (The 120 additional values are header bites and can therefore be
ignored.) For each frame, the maximum and minimum value are determined and used
to compute a scale needed for the colorization. A colormap containing a set number of
RGB-color values is used to color the pixels in the visualization based on their respective
value and the just computed scale. The way the camera’s data is processed leads to
several early insights and assumptions:
6.2.1 Relation temperature-value-color
Based on the given code I observed no inherent relation between temperature and the
assigned 14-bit value/color, aside from the relations maximum temperature->maximum
value and minimum temperature->minimum value. As each frame uses the same
colormap and the relation 14-bit value->color is reestablished with every new frame
based on the maximum and minimum values, objects that do not experience a change
in measurable temperature can be assigned vastly different colors in different frames
due to changing environmental temperatures. An example for this is shown in 6.1:
Two scenes which were observed within seconds of each other. Both scenes feature a
tray of ice cubes put on the floor, the second scene however also introduces a person’s
fingertip. In both scenes, the minimum temperatures and therefore the minimum values
are located with the ice cubes. The maximum temperatures and values however shift
from the floor in the first scene to the fingertip in the second scene. Although changes
to the floor’s temperature in between the two scenes is likely negligible, its respective
colorization vastly changes. This example is applicable to any change to the observed
temperature range in between scenes, although to a lesser extent, if changes are not as
significant as in the given example.
6.2.2 Relation color-color
As implicated in the aforementioned insight, reestablishment of the relation value-color
for each frame has a (for this work) negative impact on the relation color-color in
30
6.3 Detection of Heat Traces
(a) Low maximum temperature (b) High maximum temperature
Figure 6.1: Impact of varying temperature intervals on the visualization of the scene
between two different frames. This might prove problematic for heat trace detection
based on pure image processing.
These insights are added to by further insights that only became evident during the
implementation phase.
6.3 Detection of Heat Traces
Deciding whether a pixel in the thermal image correlates to a heat trace left by a user
or not, required several iterations until a satisfying result was achieved. This was
the case partly due to technological limitations as explained further in the following
subsections.
6.3.1 Based on the Visualization
Visualization-based processes for detecting heat traces on a display proved to be unre-
liable very early, the major problem factor being the used display itself, e.g. its own
heat radiation. As seen in 6.2, not only does the display’s inherent heat variation vary
significantly depending on the partial area of the screen, it also reaches peak values
surpassing the values correlating to a user’s heat traces. As a result, association of
a set range of colors with a heat trace is not possible. Furthermore, in many cases
contrast between heat trace and display proves to be insufficient for providing a clear
distinction, rendering detection based on the shape of a heat trace unreliable as well.
31
6 Touch Input
Figure 6.2: Thermal Image of a display featuring a (barely visible) heat trace (inside
the white circle)
Neither adjustments to the used colormap, different filtering and thresholding methods
nor enhancement of contrast led to a meaningful increase in reliability of the touch
detection algorithm. With heat trace detection based on shape and color left unfeasible
on its own, a new approach using different data had to be found. Using the camera’s
raw output data was chosen.
6.3.2 Based on the Camera’s Raw Data
Work with the camera’s raw data happened in two distinct phases. In the first phase,
the thermal camera was assumed to be stationary, ignoring movement inherent to
real life application. Using the insights gained from these stationary approaches, I
started work on dynamic approaches, which considered the camera’s movements in their
algorithms.
Stationary Approaches
While implementing a heat trace detection algorithm solely based on differences in
numerical values instead of differences in shape and color, three inherently different
approaches were tested, with varying result. In the initial phase, camera movement was
still left ignored.
First, a simple, static approach was tested. The used algorithm checks whether the value
of a pixel lies within a set interval of values deemed to match the heat trace of a touch
point. If this test returns a positive result, the pixel is assigned a color distinctively
different from the used colormap, making it easily recognizable in the visualization.
32
6.3 Detection of Heat Traces
The appropriate intervals were derived from measurements of the display, a person’s
fingertips and touch points on the screen. The goal was to ideally mark the pixels were
a user touched the display, while ignoring the display and the user’s hand as reliable
as possible. Initial tests proved to be more successful than the image processing based
approach. Heat traces were distinctively highlighted, while only the edges of a user’s
hand and small areas of the display were highlighted. However, this algorithm’s accuracy
rapidly degraded over time as the values gained from the camera steadily increased and
the intervals set in the code no longer matched with the data gathered from the camera.
Continuous recalibration and adjustment of the intervals boundaries was necessary to
regain, therefore rendering this approach non-viable.
Second, an initial calibration phase was used to gain a better estimate of the range of
values corresponding to the display. The camera would be statically mounted in front
of the display and compute a dynamic mean of the read values for several seconds.
Afterwards each pixel’s value would be compared to the computed mean and the pixel
would be highlighted if its value was significantly higher than the mean. In this case,
"significantly higher" describes a difference in values high enough to not be deemed
part of the display but low enough to not be linked to the radiation of a person’s
hand. Although more complex than its predecessor, this approach yielded no significant
improvements as it was still "too static" and the slow change of values lowered its
accuracy.
The third approach moved away from the concept of comparing the pixels’ values to a
static value, instead using relative changes in a pixel’s value as an indicator of touch. For
each pixel, the algorithm would compare its current value to its value in the previous
frame. A significant change would highlight the pixel. However, this algorithm was
prone to errors resulting from camera movement, thus proving inappropriate for this
work.
Dynamic Approaches
At this point, the camera’s movement was included into the algorithm’s design, as it
proved too difficult to ignore without deviating from the actual goal of this work. The
first approach in this phase used local differences in pixels’ values to detect heat traces,
the method also used in the final approach. With this approach, each pixel’s value is
compared to the values of its (up to) eight neighbouring pixels. If the difference exceeds
a set threshold but is low enough to not be linked to the user’s hand, it fulfills the first of
two requirements to be deemed a heat trace. The second condition requires that only
marginal changes to the pixel’s value occurred between the previous and current frame,
as a significant change implicates one of the following two scenarios: 1) An object with
considerable heat emission, e.g. a user’s hand becomes visible or invisible or 2) Due to
camera movement, the object seen at the pixel’s location changes. Combining these two
33
6 Touch Input
conditions should theoretically only leave the following areas applicable to marking: 1)
Heat traces on the display, as they only slowly cool down, 2) Some minor spots around
the outlines of a user’s hand during slow movement and 3) Other areas with distinctively
varying temperatures.
The third kind of area is the issue with this approach, as these areas do not only lie
outside the area of the display, but rather within it. Most of the areas marked by this
approach are areas on the display that do not correlate to heat traces. Once again the
screen’s inherent heat radiation behaved too similar to a heat trace.
As this phenomenon is directly linked to values measured by the thermal camera, I
conducted some further research on this topic.
6.3.3 Variance in output data and its cause
As the Lepton® outputs data as 14-bit values, one might expect outputs ranging from
0 to 16383, with temperatures near the camera’s minimum and maximum operating
temperature being assigned values close to 0 and 16383, respectively. However, this
is not the case, as became apparent during work with the camera. Output values
encountered during implementation ranged from upper 7000s to lower 9000s, while
tests measuring temperatures close to the camera’s minimum and maximum operating
range resulted in values in the lower 7000s and lower 10000s respectively. Furthermore,
output values while focusing on one sole point in a scene varied as well. While this
phenomenon was initially accounted to a natural warming-up-process regarding the
displays monitored, this reasoning could not be applied to the change in output values
for other objects, including a person’s skin.
Further investigation revealed the cause for this variance in output: According to the
FLIR Lepton® Data Sheet 1, output values rely heavily on whether the camera’s so called
radiometry mode is disabled or enabled.
6.3 shows the theoretical output of a Lepton® module with the radiometry mode
enabled. As you can see, output values directly correlate to a certain temperature. The
camera’s internal temperature does not affect the output values.
With the radiometry mode disabled however, output values depend on the thermal
camera’s internal temperature, as shown in 6.4. Only one output value is "fixed" to a
temperature. If an object in a scene has the same temperature as the thermal camera, it
is assigned the value 8192. The other values are assigned linearly.
This revelation renders work with the radiometric mode enabled more desirable, as the
measured values would no longer fluctuate as much as they did so far. However, this
is not on option, as the radiometry mode is an option reserved to original equipment
1https://cdn.sparkfun.com/datasheets/Sensors/Infrared/FLIR_Lepton_Data_Brief.pdf
34
6.3 Detection of Heat Traces
Figure 6.3: Impact of the camera’s internal temperature on the output with radiometric
mode enabled
manufacturers.
To achieve a similar result using the given equipment, additional experiments would
have been required. I would have needed to measure the output value for an object
with precisely known temperature and use linear interpolation to get an approximation
of the values corresponding to other temperatures. This procedure would require
repetition for different states of the camera’s internal temperature, finally rendering
it too time consuming and more importantly too unreliable, as an algorithm based on
these measurements would be inherently prone to errors resulting from inaccuracies in
the measurements.
Therefore I abandoned the idea of using any static values in my algorithms, solely relying
on relative differences in pixel values. However, judging from the results of the last
approach, this still was not enough, as the display’s heat radiation remained a constant
source for errors regarding the heat trace detection.
6.3.4 Using intermediate Layer for Trace Detection
As detecting heat traces directly on the display’s surface proved unreliable due to a lack
in contrast color-wise as well as value-wise, a solution compensating for the display’s
35
6 Touch Input
Figure 6.4: Impact of the camera’s internal temperature on the output with radiometric
mode disabled
heat emission needed to be found.
In this case a slab of plexiglass mounted a few centimeters in front of the actual display
as an intermediate means for touch interaction proved to be a solid solution. Not only
does the plexiglass not obstruct visibility of the display to the user, it also does not heat
up as fast as the display.
The last point is especially important as with a cooler surface, the contrast between heat
trace and surface is much more easily detected using the thermal camera. Although an
approach relying solely on image processing is still not feasible due to the display’s heat
radiation from behind the plexiglass interfering with the colorization, the difference in
raw values is much clearer and thus appropriate as an indicator of heat traces.
The algorithm used in the final approach does not differ from the previous algorithm,
but it is way more reliable when using plexiglass as a surface as opposed to a display.
The following subsections explain the steps performed by the algorithm.
Modification of the thermal image
To facilitate further processing, I applied a few modifications to the thermal image (using
a grayscale image). As described in an earlier section I highlight pixels whose values
36
6.4 Detection of the Screen Area
(a) modified thermal image (b) thresholded image
Figure 6.5: Filtering the heat traces (and some noise)
differ significantly from the values of their neighbouring pixels and whose values did
not change significantly from the previous frame. This time the algorithm marks only
heat traces, the area surrounding the plexiglass and a few scattered pixels inside the
screen area which are results of thermal reflection. To make for a strong distinction, the
pixels are colored red. The result is shown in 6.5a.
Filtering out the heat traces
After blurring the image to remove noise from thermal reflection, I convert the image
from RGB colorspace to HSV colorspace. Using the HSV colorspace I am able to filter
out all areas with a certain range color, in this case "red". The result of this operation is
seen in 6.5b. At this point the algorithm stops and its intermediate result is used further
upon successful detecting of the screen area.
6.4 Detection of the Screen Area
As explained in the beginning of this section, knowing the screen’s position is necessary
to enable reliable touch input. The used algorithm needs to decide whether the objects
and surfaces in the thermal image are part of the display or not. Assuming a distinction
between screen and non-screen areas has been made, another issue needs so be kept in
mind: If the camera moves too close to the screen, not all of the screen area is visible,
the edges of the display lie beyond the edges of the thermal image.
37
6 Touch Input
6.4.1 Based on the Visualization
In a first approach, different colormaps were used and compared to see whether any of
them led a significant difference in colorization between screen and non-screen areas,
potentially enabling a distinction based on color values. However, as the colorization
using the multi-color-based versions was subject to scaling based on maximum and
minimum values in the thermal image, no such clear distinction was possible. The
only colormap that provided the desired distinction was a grayscale colormap. Using a
grayscale, the area of screen can be detected based on the intensity of the gray colors
in the image. As already mentioned, the used display is a source of significant heat
radiation itself, resulting in higher intensity values than most of its surroundings in
the thermal image, see 6.6a. Using this information Otsu Thresholding and additional
colorization is applied to the thermal image, resulting in 6.6b. As you can see, the area
of the screen is recognized very precisely, however other sources of heat are picked up
as well, in the given scene, these are the edge of an adjacent display and a hot cup.
These smaller hot areas in the scene have to be labeled as non-screen areas, which is
achieved by relying on two assumptions: 1) The screen area is the largest highlighted
area in the scene. 2) The screen’s height and width are known. The first assumption is
made as during interaction, the user should focus their gaze on the display in front on
them, therefore resulting in most of the scene on the thermal image consisting of the
screen area. The second assumption is made to ensure that no other large heat source
would accidentally be interpreted as screen. These two assumptions are implemented
as follows: After thresholding and filtering, contour detecting is applied to the scene
and only the largest contour is accepted as screen area. Next a convex hull was put
around said contour and the hulls rotated bounding box was created. The result is
shown in 6.6c. The area recognized as screen is encased in a green bounding box. If the
bounding box’s proportions matched the screens proportions, e.g. 16:10 (while leaving
room for a margin of error of a few percent), it would be declared the screen area. This
approach proved to reliably detect and highlight the screen area on a frame-by-frame
basis, therefore also providing robustness against camera movement. However, this
initial approach does not account for the possible scenario of only parts of the screen
being visible. In these cases, the created bounding box would not match the screen’s
proportions and therefore not be usable. And ideal solution to this problem would
require information on the camera’s distance to the screen, which was not possible due
the lack of a second camera needed for a sense of depth. Therefore, an approximation
had to be made, that does not rely on exact distance values. An initial idea looked liked
this: The algorithm would output a visualization featuring the bounding box encasing
the screen area. If the bounding boxes proportions match the screen’s proportions, it
would change color. During this phase, the current bounding box could be "locked in"
via manual input. From this point on, the bounding boxes proportions and size would
38
6.4 Detection of the Screen Area
(a) thermal image (b) thresholded and colored
image
(c) screen area detected
Figure 6.6: Detection of the screen area based on its heat emission
not change any further and any relative movement of the screen in the scene would
be applied to it. This approach had the advantage that it reliably keeps track of the
bounding box’s and therefore the screen’s coordinates even beyond the boundaries of the
thermal image. The disadvantage linked to this method is the underlying assumption that
once the screen has been locked in, no further changes in distance between screen and
camera occur. This assumption seems inappropriate to real world use cases, justifying
the need for a more refined algorithm.
6.4.2 Based on the Camera’s Raw Data
Algorithms using the camera’s raw numerical data offered no clear advantages over
the previous method using intensity values in a grayscale image during this phase of
implementation and were therefore not considered any further.
6.4.3 Final Approach
The change from the display itself to a slab of plexiglass as surface for interaction
had an immense impact on the requirements for the screen detection algorithm. As
the plexiglass blocks of most of the screen’s heat radiation, therefore the basis of the
previous algorithm, a new approach is necessary. Although the previous algorithm still
highlighted parts of the plexiglass as potential screen areas, the result was far from
acceptable. Adjustments to the algorithm, e.g. the thresholding and filtering operations
did not result in any improvements. For a new algorithm, the screen’s as well as the
plexiglass’ respective properties have to be taken into account and compared: Whereas
the screen possesses a relatively high and fluctuating heat emission over its area, the
plexiglass heats up much more slowly and evenly. In a thermal image this becomes
39
6 Touch Input
evident by heat traces of points of touch being much more well distinct on the plexiglass
than on the screen. However, with parts of the screen’s heat emission still visible in the
thermal image, the colorization of the heat traces does still not prove to be a reliable
criterion for detection. A detection algorithm based on the camera’s raw data seemed
to be required. As with previous approaches using the camera’s raw data, approaches
based on comparing pixel values to other set values, yielded no satisfying results. The
final algorithm used to detect heat traces intuitively bears a strong resemblance to an
algorithm needed to detect the plexiglass. Whereas the heat trace detection looks for
local maxima in the thermal image, the algorithm detecting the plexiglass cares about
detecting an area with largely similar values. As already described in a previous section
and seen in ??, the heat trace detection algorithm initially not only highlights heat traces,
but also the area surrounding the area of the plexiglass. Apparently, the difference in
values between the plexiglass and its immediate surroundings is significant enough to
be detected. Therefore the inversion of the initial heat trace marking algorithm is used
as basis for the screen detection algorithm. In the following I will expand on the steps
performed by the final screen detection algorithm, from being input a thermal image to
outputting a monitored screen’s approximate coordinates:
Modification of the thermal image
Before applying any actual image processing functions to the thermal image using
the GitRepository’s code, I first of make some modifications to the image, simplifying
the following steps. The thermal camera’s initial output image, as seen in 6.7a is
modified in a way similar to the procedure described in the dynamic approach for
detecting heat traces, this time however, the pixels that would be marked in that step,
are "not marked", whereas the unmarked pixels are now marked. Marked pixels are
colored white, unmarked pixels are colored black, maximizing the contrast in intensity
between the relevant areas in the screen. The result of this modification is shown in
6.7b. Although this initial step highlights the area of the screen, it also highlights areas
scattered around it, potentially leading to false positives (non screen areas interpreted
as part of the screen). These areas need to be removed from the image.
Clearing up the image
First of, a Gaussian blur filter is applied. As shown in 6.7c, this results in the smaller
non-relevant areas around the screen area being colored in grayish tones, whereas the
screen areas itself is colored mostly white, therefore having a much higher intensity
than most of the scene. Next, erosion followed by dilation takes place. This removes
most of the noise surrounding the screen from the scene, as shown in 6.7d. Lastly, the
40
6.4 Detection of the Screen Area
(a) initial thermal image (b) differences in values highlighted
(c) after blurring (d) after erosion & dilation
(e) after thresholding
Figure 6.7: Intermediate results of the different processing steps applied to the thermal
image
41
6 Touch Input
scene is thresholded using Otsu Thresholding, leaving the screen area and some smaller
parts of the surroundings that can safely be ignored. This steps final result is depicted in
6.7e. Closer inspection of the initial thermal image and the processed image shows that
the screen area is not 100% accurately picked up. This stems mainly from the erosion
process, which was necessary for a usable result in my tests. A potential fix to this issue
could be found
Assumption of the screen’s position
Using the thresholded image from the previous step, I first run a contour detection
algorithm on the image, segregating potential screen areas from the rest of the scene. As
the user is assumed focus their gaze on the display during interaction, only the contour
spanning the largest area is further processed. Before creating the bounding box around
it, I first create a convex hull around the contour to compensate for any indentations
in the contour resulting from previous processing steps. Lastly, the minimum rotated
bounding box is created, encasing the potential screen area. To gauge whether the
created bounding box actually correlates to a display needs further testing conducted in
the next step.
Calculation of the screen’s coordinates
The checks and modifications performed in this step rely on the assumption that the
screen’s proportions are known to the algorithm. If the rotated bounding box’s propor-
tions match the screen’s proportions (within a preset margin of error), the bounding
box is accepted and its vertices’ coordinates are used to approximate the screen’s area.
This approach however relies on the bounding box’s vertices lying within the camera’s
field of view, which is not always the case, therefore several different cases have to be
considered and appropriate approximations have to be conducted:
Case 1: Four vertices This is the default case, shown in 6.8a. All four vertices lie within
the camera’s field of view. No further approximations are necessary.
Case 2: Three vertices The case shown in 6.8b requires no further consideration, as
the algorithm to create the bounding box automatically correctly calculates the outlying
vertex’s coordinates.
42
6.4 Detection of the Screen Area
Case 3: Two vertices This case is composed of the two subcases depicted in 6.8c and
6.8d, either two adjacent or two opposing vertices lying outside the field view. With the
first case, one edge of the screen lies completely within the field of view. Based on the
bounding box’s orientation, this edge is assumed to match the screen’s width or height.
Using the knowledge on the screen’s proportions, the coordinates of the two outlying
vertices can be computed. The other subcase, although unlikely to occur during real
life application, as it requires a considerable tilt of the camera, thus the user, to one
side, is accounted for by computing the outlying vertices’ coordinates using the screen’s
proportions and the length of the diagonal within the field of view.
Case 4: One vertex The case shown in 6.8e cannot be solved without precise knowledge
on the distance between camera and screen. As there is an infinite number of rectangles
matching the screen’s proportions, the outlying vertices’ coordinates cannot be computed
using a single vertex as input. Therefore, an approximation is made that the distance to
the screen did not change in between the last and current frame. Thus, no adjustment
to the screen’s relative size is made and the last detected screen area is merely shifted,
based on the visible vertex’s position and the bounding box’s orientation.
Case 5: No vertices In the case shown in 6.8f the screen’s edges lie outside the camera’s
field of view. As with the previous case, the screen’s position cannot be calculated
precisely. However, this time no approximation can be made either as it would require
knowledge on the distance between screen and camera. Furthermore, a distinction
between screen and non-screen areas is not possible. Using a potentially detected
thermal reflection of the user’s hand does not suffice either, as thermal reflectivity is not
restricted to display’s exclusively. For the listed reasons, this case is not accounted for by
this algorithm.
Other cases There are several additional cases, more precisely subcases of case 4 and
5 that will not be discussed further at this point. The subcases resemble the second
subcase of case 3 and involve the visibility of at least two opposing edges. They are edge
cases and handled similar to case 2.
Although the algorithm accounts for all of these cases, only the first two cases an to
some degree the third case as well, can be deemed accurate. Now that the screen’s
position was approximated, the coordinates of its vertices can be used in the final steps
of the heat detection algorithm.
43
6 Touch Input
(a) (b) (c)
(d) (e) (f)
Figure 6.8: Only parts of the screen may lie within the camera’s field of view
6.5 Calculating the Point of Touch’s Relative Position on the
Screen
So far I have calculated the coordinates of the screens vertices and created a thesholded
image containing the searched heat trace as well as some noise. I run a contour detection
algorithm on the thresholded image, ignoring all contours whose area is too large to
be deemed a heat trace resulting from a fingertip. For each leftover contour I check
whether its center lies within the screen area using barycentric coordinates. If it does, it
is recognized as heat trace and its coordinates are transformed into the screen’s local
grid, exemplified in 6.9. These coordinates are used in following frames, as to ignore
recent heat traces which no longer signify input. Heat traces within the proximity of the
saved pixels coordinates are only recognized as new input, if their value assigned by the
thermal camera shows an increase.
44
6.5 Calculating the Point of Touch’s Relative Position on the Screen
(a) camera’s global grid (b) screen’s local grid
Figure 6.9: Transformation of the touch point’s coordinates from the camera’s global
grid to the screen’s local grid
45

7 Gesture Input
Due to time constraints becoming apparent during the development of the prototype
using touch input, development of a prototype using gesture input was halted in favor of
developing a working prototype for touch input. Therefore, the prototype using gesture
input did not reach the phase of being tested in a user study. Its validity is thereby
questionable and will not be assured for in this work. Furthermore, work on the gesture
interface started in the late phases of development of the touch interface, resulting in
insights gained then facilitating implementation of certain partial features needed for
gesture input. Other features could not be finished in time for a user study and potential
ways to implement them will be discussed at this point instead.
7.1 Initial Approach
Initially the kind of gesture used for input was decided on. Possible gestures include
gestures using one or multiple fingers, e.g. pinching thumb and index finger and
pointing using the index finger, as well as gestures using the whole hand, using the
relative movement of the hand as input, e.g. swiping gestures to the left/right and
up/down. The second option was chosen as it promised a simpler implementation and
a potentially higher degree of intuitivity. The naive approach of detecting movement
gesture using the user’s hands looked like this:
1. Get the thermal image from the camera
2. Pinpoint the position of the user’s hand in the current frame
3. Pinpoint the position of the user’s hand in the previous frame
4. Compute the relative movement of the hand using these coordinates.
This line of implementation would rely almost completely on the detection of the user’s
hand and does not need to take the display’s exact position into account, as opposed to
the methods used for touch detection.
47
7 Gesture Input
7.2 Based on the Visualization
During work on the touch based approaches, thermal reflection, which makes up the
basis of the gesture based approaches, was barely noticeable. This implicated that hand
detection on basis of the visualization would not be feasible. Further testing confirmed
this suspicion. Several factors impeded the detection of hands in the thermal image:
Contrast between a user and the background was small, rendering detection based on
contrast unreliable. Furthermore, due to the thermal camera’s relatively low resolution
of 80x60 pixels, making out a hand’s outline proved harder the further away it was from
the display. For these reasons, I proceeded as with the touch based approaches and
further investigated the thermal camera’s raw output data.
7.3 Based on the camera’s raw output data
Although there was a difference in output values between the background and a hand, it
was not significant enough to make for a useful criterion for hand detection. Furthermore,
these differences in values were only noticeable during movement of the hand. While
still, a user’s reflection basically blended in with the background. As the basis for my
initial approach, the distinction between hand and background in the thermal reflection,
did not prove feasible, I searched for another approach.
7.4 Prefinal Approach
As mentioned in the previous section, distinction between a hand and the background
was only clear while the hand was in motion. This led me to the idea of making the
gesture input rely solely on the hand’s actual movement, without consideration of its
shape. However, there was a problem with this approach as well: Although movement
of the hand in the thermal reflection can be detected on basis of changes in a pixel’s
assigned value, as described in chapter Touch Input, this method of detection included
any kind of movement within the scene, including head movements of the user and
movement of the screen relative to the camera. To accommodate for this, I made the
following assumptions: 1) The user is positioned within an optimum distance to the
display and 2) Movement of the upper body is reduced to a minimum. An optimum
distance refers to a distance that maximizes the area of the display within the camera’s
field of view. With the equipment used in this work this referred to a distance of about
half a meter or less. Under these assumptions, I constructed a primitive algorithm for
gesture detection using hand movement.
48
7.4 Prefinal Approach
Figure 7.1: Motion in the thermal reflection visualized
7.4.1 Modifying the thermal image
Using the method described earlier to detect fluctuations in pixel values, I transformed
the thermal image into a visualization of movement. Pixels that significantly changed
values between frames were marked. Resulting in an image as depicted in 7.1.
7.4.2 Detecting the hand movement
In this step, I ignored the middle area of the image, as I assumed this to be the area
where a user’s upper body and head are located, whose movement is irrelevant to the
gesture detection. Focusing on the outer areas on the right and left of the image allowed
an easy detection of movement from a user’s hands and arms. Within the specified areas,
highlighted clusters of pixels (the movement) are compared between frames. Their
most northern point serves as a point of orientation. Changes in this point’s coordinates
directly correlate to a respective movement of a user’s hand. As to not mix up actual
input and idle movements, I put up a minimum threshold in coordinate changes that
was necessary to implicate movement intended as input. They way this algorithm works
is depicted in ??.
No further refinements to this algorithm were made during the work’s allotted time
frame, rendering its actual viability in real life scenarios questionable.
49
7 Gesture Input
(a) (b)
Figure 7.2: Relative movement of the pixel cluster’s northern extreme point indicates
gesture input
7.5 Further considerations
The approaches discussed so far do not deal with the issue of how to mark the beginning
and end of an interaction between user and display. The beginning of a touch interaction
is marked by the appearance of a heat trace on the display. Its end is open and no further
touch input leads to no further reaction from the display. With gesture input based
on movement, this is not as simple. Movement in the user’s surroundings, e.g. other
passersby have to be taken into account, as it does not implicate input but rather noise
that needs to be filtered. Otherwise, an interaction might start against the will of the
current user. Considering this, further refinement of the prefinal approach is necessary,
possibly including detecting of hands using more advanced algorithms and hardware.
50
8 User Study
To evaluate the accuracy of the touch based prototype, I conducted a user study in a
controlled lab setting. 5 participants between the ages of 22 and 28 (4 males, 1 females)
took part in the study. In the following chapter I will discuss its setup, procedure and
findings.
8.1 Setup
I constructed the prototype gear by attaching the thermal camera to a spectacle frame. A
long pair of cables connects the camera and the Raspberry Pi. The Pi itself is fixed inside
a metal case modified for being worn by a user. The complete gear is shown in 8.1. The
display the participants interacted with is a 48x27cm LC display with proportions with
a resolution of 1360x768pixels, which was modified by fixating a slab of plexiglass a
few centimeters in front of it. The slab covers the entirety of the screen and measures
50x30cm. In consequence, the outer areas of the plexiglass can be interacted with but
should not trigger a response from the system. When mapping from the camera grid to
the screen grid, this no-response-area has to betaken into account.
The described setup was installed at approximately eye level and is shown in 8.2.
8.2 Procedure
The study was divided into two parts, a practical part and a theoretical part. During
the first part the participants were asked to perform a series of simple touch based
operations. The procedure was designed as follows:
As soon as a participant gave the signal, that they were ready, the tasks commenced.
They screen would display a monotonous light blue background with a single red circle
on it, the participants were supposed to touch. After confirmation of a successful touch
event at the position of the circle, the circle would disappear and a circle in another
location on the screen would appear. This scenario was repeated a set number of times.
Three different types of circles varying in size were displayed: Small, medium and
51
8 User Study
Figure 8.1: The prototype gear
large circles. Their diameters (dependent on the display used in the study) were as
follows: Small: 50 pixels, medium: 80 pixels, large: 110 pixels. Furthermore, each
circles center was marked by a small white cross. Participants were asked to touch the
circles as close to their centers as possible. A touch event was deemed successful if the
detected circle of contact’s center lay within the circle’s edge. Each kind of circle was
displayed the same amount of times. For each circle, the following information was
documented: Its distance to the last point (preset, measured from outer edge to edge),
the number of attempts until a successful touch event occurred and the distance between
the circle’s center and the point of touch. The sequence of circles did not change in
between participants and at every point in time during the study participants had the
choice of aborting the experiment. 8.3 shows one of the participants during interaction
with the experimental setup using the prototype gear.
During the second part of the study the participants were asked to fill out a questionnaire.
The first part of the questions asked about the participants’ prior experiences with public
displays and touch displays. The second part contained questions regarding the setup
used in the study, including the option of naming possible improvements to the prototype.
Answering the questions happened voluntarily and could be aborted at any time. For an
overview of the questions asked please refer to the Additional material.
52
8.3 Evaluation
Figure 8.2: The setup used during the study
8.3 Evaluation
8.1 shows the measured data for each distinct circle. Size of the circle seemed to have
no direct correlation to the algorithm’s accuracy, as well as the number of attempts
needed for a successful touch interaction. This however seems strange, as the accuracy
of the input should directly correlate to its success rate regarding accepted input. This
leads to two possible conclusions: Either there are faults with the way the algorithm
calculates the position of heat traces, or the measurements taken during the study are
faulty. Despite that, interaction seemed probable, even though it is far from perfect as of
now. Evaluation of the questionnaire led to the following results:
Although initially, only three out of five participants could relate to the term "public
display", in retrospect all participants declared they had prior experiences with public
displays. It became evident that slight confusion regarding the term stemmed from a
53
8 User Study
Figure 8.3: A user interacting with the display using the prototype
non-consideration of non-interactive displays as public displays. Locations, in which the
participants encountered public displays encompassed the expected candidates, such as
schools/universities, pedestrial areas, public transit,train stations, air ports and shopping
centers. 60 % of participants hard engaged interaction with the displays as well. In the
experienced cases, interaction modalities were restricted to touch and gesture based
input. All participants had prior experiences with touch displays, and only in one case
the experiences were restricted to personal devices. Other touch devices included public
displays navigational devices, although the latter could be considered a personal device
as well.
In the second part of the questionnaire, comfort of use, accuracy and response time
regaring the introduced prototype were estimated on a five point Likert scale. With
average scores of 3.8, 3.8 and 2.8 respectively, all three aspects appear to be of average
satisfaction to the participants, requirering further improvements, especially regarding
accuracy and response time. (By accident the scale on the questionnaire was inverted.
54
8.3 Evaluation
Size of Distance to Inaccuracy # of attemps
circle last circle in pixels
# of circle in pixels highest lowest average highest lowest average
1 large — 22 5 12 1 1 1
2 medium 1041 29 7 20 2 1 1
3 small 310 23 11 17 1 1 1
4 large 732 24 15 19 1 1 1
5 small 891 17 3 13 3 1 2
6 large 882 14 7 11 1 1 1
7 medium 282 15 4 8 1 1 1
8 small 473 16 8 12 1 1 1
9 small 152 12 4 7 2 2 2
10 small 507 15 6 9 2 1 1
11 large 703 17 7 12 2 1 1
12 medium 297 13 5 8 1 1 1
13 medium 533 20 11 14 2 1 1
14 small 545 17 7 11 3 2 2
15 large 754 18 5 13 2 1 1
16 large 825 16 15 15 1 1 1
17 large 288 19 11 13 1 1 1
18 small 337 12 3 9 2 1 2
19 medium 86 14 7 11 1 1 1
20 medium 1065 17 9 14 2 1 1
21 medium 484 16 7 12 1 1 1
Table 8.1: Measured results for each circle
The numbers I just stated are based on a scale inverted to the one found in the appendix.)
Three participants were not sure whether they wanted to use a comparable device again
in the future, one was positive and the last one negative in that regard.
Named aspects considered in need of improvement were: Reduction of "cable spaghetti",
improving portability by replacing the Raspberry Pi with a smaller device fulfilling its
functions and improving response time.
55

9 Conclusion & Outlook
This work proposes the use of wearable thermal cameras as a means to enable interaction
with otherwise non-interactive public displays. As stationary cameras installed in public
spaces encroach on passersby privacy and ordinary RGB cameras do not provide the
flexibility regarding possible input modalities that thermal cameras do, this approach
was deemed a promising approach to work on. A prototype based on touch interaction
using heat traces left on a display by a user’s fingertips was implemented. Its feasibility
was tested during a user study and came to the conclusion that although fullfilling
the minimum requirements for acceptable touch interaction, further imporvements are
necessary for an optimum user experience. During the work on this project, several
limitations of the proposed approach, as well as starting points for future work became
evident.
9.1 Limitations
There are several factors which limit the described prototype’s usefulness in a real life
application: First off, the prototype’s reliance on the detection of heat traces to assess
touch input restricts it in several ways which do not apply to conventional touch displays.
These restrictions largely stem from the fact that to register a heat trace, a clear line
of sight between camera lens and display is required. Inherently, this leads to a higher
response time, as input is not registered upon touch itself but rather upon the point of
contact’s lingering heat becoming visible in the thermal image. This requires the user
to move their fingers out of the way, an action which in itself represents an additional
latency factor during interaction. Furthermore, depending on the specific use case, a
user’s hand obstructing the camera’s field of view is a natural and frequent occurrence.
One just has to envision interaction with an on-screen keyboard. Finger movements
from lower rows of keys to upper ones naturally lead to the obstruction of lower keys,
depending on the input sequence, potentially leading to inputs not being registered. To
counteract this, a user would need to adapt their usual way of typing, further decreasing
the speed of interaction. Another potential solution to this issue might be found by
designing a display’s interface specifically with interaction using a wearable device in
mind. Right now, this approach’s feasibility cannot be judged reliably, as the necessary
57
9 Conclusion & Outlook
data is currently non-existent. Additionally, obstructions may occur in other ways as
well. Assuming the wearable device consists of a pair of glasses featuring a camera, as
used in this work, a user’s own hair for example is another potential obstruction. Under
windy weather conditions, a user’s hair might obstruct the camera’s lens, rendering the
device non-functional. However, this issue would be mostly restricted to outdoor public
displays and can be counteracted by a user’s own awareness regarding their wearable
device.
The other main factor directly relating to the prototype’s accuracy is its ability to precisely
register the display’s position. Accuracy is limited by the software and hardware used.
The software, more precisely the implementation of the algorithms used to detect the
screen, has to be adapted to the available hardware. A camera with a higher resolution
might detect a screen more reliable than one with a lower resolution. Further work with
higher resolution cameras is necessary to judge the hardware’s impact on the device’s
accuracy.
Next, in this work, the display in its entirety lying within the camera’s field of view
is considered the default case, providing another restriction to the possible use cases.
Using a camera with a 51° horizontal field of view and 63.5° diagonal field of view and a
21" display resulted in a minimum distance between screen and camera of about half a
meter. This requirement negatively impacts the prototype’s feasibility in several aspects.
First off, the user’s freedom in choosing their preferred distance during interaction is
greatly reduced. This might also limit the range of input data suitable to a public display.
At close range a user’s own body may serve as a physical layer of protection against a
third party’s insights into the user’s private data by blocking the view of people behind
them. If they need to step further away from the display to accurately input their data,
this is no longer an option. Looked at from another angle, this requirement also hinders
support for a wide spectrum of public displays. Depending on the camera’s field of view,
it might be straight up impossible to interact with larger displays, as the camera would
not be able to capture the entire screen while staying within the range required for touch
input. To enable support for large displays, refinements to the software and hardware
are necessary.
Currently I cannot specify meaningful limitations regarding a gesture based system, as
this would require a fully functional prototype and evaluation of its viability.
9.2 Outlook
The results of this work provide numerous starting points for future work. As imple-
mentation of a prototype for gesture based interaction could not be completed during
the allotted time frame, future work may pick up at that point and finish a functional
prototype. It may then be tested regarding the feasibility of motion based input versus
58
9.2 Outlook
finger gestures and compared to conventional systems incorporating gesture input.
Other starting points are based on the limitations and shortcomings of the finished
prototype listed in the previous section. Future work may focus on how to maximize
accuracy as well as on how to minimize response time.
Another improvement to this work’s prototype lies with enabling support for multi-touch
gestures, e.g. swipe gestures known from smartphones and the like.
Probably one of the most important and challenging starting points for future work lies
with improving aspects of screen detection and tracking. As there is currently no set
standard for public displays regarding their size and shape, enabling support for touch
based interaction for a wide spectrum of them using a single wearable device would be a
considerable improvement over the current prototype. This would require an emphasis
on enabling detection of a public display not relying on monitoring its edges at all times.
Only partially visible areas of a display have to be recognized as such and need to be
mapped to the display’s overall size, position and orientation. It is likely that approaches
trying to accomplish exactly this will need to evaluate a camera’s distance and angle
towards a display, requiring hardware in addition to a single thermal camera.
As new technologies become affordable to everyday users, comparing their feasibility to
that of the proposed approaches becomes necessary as well.
All in all, work on the implementation of a prototype for enabling interaction with
otherwise non-interactive public displays illustrated numerous limitations. While being
obstacles on the one hand, they also provide valuable opportunities for future work.
59

A Additional material
A.1 Questionnaire
A.1.1 English questionnaire
A.1.2 German questionnaire
61
A Additional material
Bachelor Thesis „  Thermal Imaging for Public Displays“  
User Study
Conductor: Alexander Frank 2787130
Questionnaire
Personal Data:
Last name, first name: _________________________________________
Age:  _________________________________________
Sex: [ ] Male [ ]Female
General Questions:
1) Did you have any experience with public displays prior to this study?
[ ]Yes
[ ] No
[ ]What is a public display?
If answer 3 was chosen, skip questions 2-4 and inform the study's conductor after having filled out  
the questionnaire.
2) Where did you encounter the public display(s)?
____________________________________________________________
3) Was/were the display(s) interactive?
[ ]Yes
[ ]No
[ ]Some of them.
[ ]I don't know.
If answers 2 or 4 were chosen, skip the next question.
4) How did interaction take place?
____________________________________________________________
5) Do you have any previous experiences with touch displays?
[ ]Yes
[ ]No
6) Is this experience restricted to personal devices (smartphone, tablet, etc)?
[ ]Yes
[ ]No
If Yes was chosen, skip the next question
7) Name as many touch devices you had experience with in the past as you want.
____________________________________________________________
____________________________________________________________
62
A.1 Questionnaire
Questions regarding the study's setup:
8) How would you rate the comfort of the wearable device?
1 2 3 4 5
     very good                 alright       very poor
9) Could you imagine using such a device again in the future? If not, why?
[ ]Yes
[ ]Not sure.
[ ]No: ________________________________________________________
________________________________________________________
10) How would you rate the prototype's response time?
1 2 3 4 5
     very fast                 alright       very slow
11) How would you rate the prototype's accuracy?
1 2 3 4 5
     very good                 alright       very poor
12) Are there any aspects of the prototype you would like to see improved? If yes, which?
[ ]No.
[ ]Yes: ____________________________
13) Please note any further remarks you might have concerning the prototype.
____________________________________________________________
____________________________________________________________
____________________________________________________________
Thank you very much for your 
participation in this user study!
63
A Additional material
Bachelorarbeit „  Thermal Imaging for Public Displays“  
Benutzerstudie
Leiter: Alexander Frank 2787130
Fragebogen
Persönliche Daten:
Nachname, Vorname:  _________________________________________
Alter:  _________________________________________
Geschlecht: [ ] Männlich [ ]Weiblich
Allgemeine Fragen:
1) Hatten Sie vor dieser Studie bereits Erfahrungen mit Public Displays gemacht?
[ ]Ja
[ ]Nein
[ ]Was ist ein Public Display?
Falls Sie Antwort 3 gewählt haben, überspringen Sie Frage 2- 4 und benachrichtigen Sie bitte den  
Leiter der Studie, nachdem Sie diesen Fragebogen ausgefüllt haben.
2) Wo begegneten Sie dem/den Public Display(s)?
____________________________________________________________
3) War(en) das/die Display(s) interaktiv?
[ ]Ja
[ ]Nein
[ ]Manche.
[ ]Ich weiß nicht.
Falls Antwort 2 oder 4 gewählt wurde, überspringen Sie die nächste Frage.
4) Wie gestaltete sich die Interaktion?
____________________________________________________________
5) Haben Sie vorangehende Erfahrungen mit Touch Displays?
[ ]Ja
[ ]Nein
6) Sind diese Erfahrungen auf Privatgeräte beschränkt (Smartphone, Tablet)?
[ ]Ja
[ ]Nein
Falls Ja gewählt wurde, überspringen Sie die nächste Frage.
7) Nennen Sie so viele Geräte mit Touch Eingabe, mit denen Sie in der Vergangenheit 
Erfahrungen gemacht haben, wie Sie möchten.
____________________________________________________________
____________________________________________________________
64
A.1 Questionnaire
Fragen bezüglich dem Aufbau aus der Studie:
8) Wie würden Sie den Komfort des tragbaren Geräts bewerten?
1 2 3 4 5
     sehr gut                    ok       sehr schlecht
9) Könnten Sie sich vorstellen, ein solches Gerät in Zukunft erneut zu verwenden?
Falls nein, wieso?
[ ]Ja
[ ]Nicht sicher.
[ ]Nein: ________________________________________________________
  ________________________________________________________
10) Wie würden Sie die Antwortzeit des Prototypen bewerten?
1 2 3 4 5
     sehr schnell            ok       sehr langsam
11) Wie würden Sie die Genauigkeit des Prototypen bewerten?
1 2 3 4 5
     sehr gut                    ok       sehr schlecht
12) Gibt es Aspekte des Prototypen, die Sie gerne verbessert sehen würden?
Falls ja, welche?
[ ]Nein.
[ ]Ja: ____________________________
13) Bitte notieren Sie weitere Anmerkungen, die Sie bezüglich des Prototypen haben.
____________________________________________________________
____________________________________________________________
____________________________________________________________
Vielen Dank für ihre Teilnahme
an dieser Benutzerstudie!
65

Bibliography
[ASS+12] F. Alt, S. Schneegaß, A. Schmidt, J. Müller, N. Memarovic. “How to evaluate
public displays.” In: Proceedings of the 2012 International Symposium on
Pervasive Displays. ACM. 2012, p. 17 (cit. on p. 24).
[EBBF04] M. Eaddy, G. Blasko, J. Babcock, S. Feiner. “My own private kiosk: privacy-
preserving public displays.” In: Proc. Eighth Int. Symp. Wearable Computers.
Vol. 1. Oct. 2004, pp. 132–135. DOI: 10.1109/ISWC.2004.32 (cit. on
p. 24).
[KFF+09] N. Kaviani, M. Finke, S. Fels, R. Lea, H. Wang. “What Goes Where?: De-
signing Interactive Large Public Display Applications for Mobile Device
Interaction.” In: Proceedings of the First International Conference on In-
ternet Multimedia Computing and Service. ICIMCS ’09. Kunming, Yunnan,
China: ACM, 2009, pp. 129–138. ISBN: 978-1-60558-840-7. DOI: 10.1145/
1734605.1734637. URL: http://doi.acm.org/10.1145/1734605.1734637
(cit. on p. 24).
[KK05] O.-K. Kwon, S. G. Kong. “Multiscale fusion of visual and thermal images for
robust face recognition.” In: Proc. IEEE Int. Conf. Computational Intelligence
for Homeland Security and Personal Safety CIHSPS 2005. Mar. 2005, pp. 112–
116. DOI: 10.1109/CIHSPS.2005.1500623 (cit. on p. 23).
[LCG+11] E. Larson, G. Cohn, S. Gupta, X. Ren, B. Harrison, D. Fox, S. Patel. “Heat-
Wave: Thermal Imaging for Surface User Interaction.” In: Proceedings of
the SIGCHI Conference on Human Factors in Computing Systems. CHI ’11.
Vancouver, BC, Canada: ACM, 2011, pp. 2565–2574. ISBN: 978-1-4503-
0228-9. DOI: 10.1145/1978942.1979317. URL: http://doi.acm.org/10.
1145/1978942.1979317 (cit. on p. 23).
[OKB15] M. Ostkamp, C. Kray, G. Bauer. “Towards a Privacy Threat Model for Public
Displays.” In: Proceedings of the 7th ACM SIGCHI Symposium on Engineering
Interactive Computing Systems. EICS ’15. Duisburg, Germany: ACM, 2015,
pp. 286–291. ISBN: 978-1-4503-3646-8. DOI: 10.1145/2774225.2775072.
URL: http://doi.acm.org/10.1145/2774225.2775072 (cit. on p. 24).
67
[PR15] K. Palovuori, I. Rakkolainen. “The Heat is on: Thermal Input for Immaterial
Interaction.” In: Proceedings of the 19th International Academic Mindtrek
Conference. AcademicMindTrek ’15. Tampere, Finland: ACM, 2015, pp. 152–
154. ISBN: 978-1-4503-3948-3. DOI: 10.1145/2818187.2818272. URL: http:
//doi.acm.org/10.1145/2818187.2818272 (cit. on p. 23).
[SAH+14] A. Sahami Shirazi, Y. Abdelrahman, N. Henze, S. Schneegass, M. Khalil-
beigi, A. Schmidt. “Exploiting Thermal Reflection for Interactive Systems.”
In: Proceedings of the SIGCHI Conference on Human Factors in Comput-
ing Systems. CHI ’14. Toronto, Ontario, Canada: ACM, 2014, pp. 3483–
3492. ISBN: 978-1-4503-2473-1. DOI: 10.1145/2556288.2557208. URL:
http://doi.acm.org/10.1145/2556288.2557208 (cit. on pp. 21, 23).
[Sch15] S. Schneegass. “There is More to Interaction with Public Displays Than
Kinect: Using Wearables to Interact with Public Displays.” In: Proceedings
of the 4th International Symposium on Pervasive Displays. PerDis ’15. Saar-
bruecken, Germany: ACM, 2015, pp. 243–244. ISBN: 978-1-4503-3608-6.
DOI: 10.1145/2757710.2776803. URL: http://doi.acm.org/10.1145/
2757710.2776803 (cit. on p. 24).
[SLP12] E. N. Saba, E. C. Larson, S. N. Patel. “Dante vision: In-air and touch gesture
sensing for natural surface interaction with combined depth and thermal
cameras.” In: Proc. IEEE Int. Conf. Emerging Signal Processing Applications.
Jan. 2012, pp. 167–170. DOI: 10.1109/ESPA.2012.6152472 (cit. on p. 23).
[SWZ15] X. Shi, S. Wang, Y. Zhu. “Expression Recognition from Visible Images with
the Help of Thermal Images.” In: Proceedings of the 5th ACM on International
Conference on Multimedia Retrieval. ICMR ’15. Shanghai, China: ACM, 2015,
pp. 563–566. ISBN: 978-1-4503-3274-3. DOI: 10.1145/2671188.2749355.
URL: http://doi.acm.org/10.1145/2671188.2749355 (cit. on p. 23).
[TGC06] P. Tarasewich, J. Gong, R. Conlan. “Protecting Private Data in Public.” In:
CHI ’06 Extended Abstracts on Human Factors in Computing Systems. CHI
EA ’06. Montr&#233;al, Qu&#233;bec, Canada: ACM, 2006, pp. 1409–
1414. ISBN: 1-59593-298-4. DOI: 10.1145/1125451.1125711. URL: http:
//doi.acm.org/10.1145/1125451.1125711 (cit. on p. 24).
All links were last followed on December 19, 2016.
Declaration
I hereby declare that the work presented in this thesis is
entirely my own and that I did not use any other sources
and references than the listed ones. I have marked all
direct or indirect statements from other sources con-
tained therein as quotations. Neither this work nor
significant parts of it were part of another examination
procedure. I have not published this work in whole or
in part before. The electronic copy is consistent with all
submitted copies.
place, date, signature