05 Fakultät Informatik, Elektrotechnik und Informationstechnik

Permanent URI for this collectionhttps://elib.uni-stuttgart.de/handle/11682/6

Browse

Search Results

Now showing 1 - 10 of 132
  • Thumbnail Image
    ItemOpen Access
    Visual prediction of quantitative information using social media data
    (2017) Fatehi Ebrahimzadeh, Hamed
    In recent years, the availability of a vast amount of user-generated data via social media, has given an opportunity to researchers for analyzing these data sources and discovering meaningful information. However, processing and understanding this immense amount of data is challenging and calls for automated approaches, and involvement of field experts to use their field knowledge and experience to enhance the data analysis. So far, existing approaches only enable the detection of indicative information from the data such as the occurrence of critical incidents, relevant situation reports etc. Consequently, the next step would be to better relate the user provided information to the real-world quantities. In this work, a predictive visual analytics approach is developed that offers semi-automated methods to estimate quantitative information (e.g. number of people who participate in a public event). At first, the approach provides interactive visual tools to explore social media data in time and space and select features required as input for training and prediction interactively. Next, a suitable model can be trained based on these feature sets and applied for prediction. Finally, the approach also allows to visually explore prediction results and measure quality of predictions with respect to the ground truth information obtained from past observations. The result of this work is a generic visual analytics approach, that provides expert user with visual tools for a constant interaction between human and machine, for producing quantitative predictions based on social media data. The results of predictions are promising, especially in cases that the location, time and other related information to public events are considered together with the content of user-generated data.
  • Thumbnail Image
    ItemOpen Access
    Developing a multimodal feedback motion guidance system in VR for people with motion disabilities
    (2021) Wennrich, Kevin
    Motion is an important aspect in the area of physiotherapy. The correctness of those motions is even more important, especially in the home exercises. In this thesis, the prototype of a multimodal guidance system in virtual reality, which tracks the movements of the users and compares it to the correct position in the field of physiotherapy exercises was created. The get the requirements for the system, people who needed to go to physiotherapy, because of an injury or a disability (stroke, MS, NPC), were interviewed, as well as a physiotherapist. Based on the results, we have implemented a virtual physiotherapist and the auditory guidance as two modalities. Further modalities have been the ghostarm and the haptic guidance as vibration bands. The prototype in which the user can choose and combine the guidances have been developed. The system, the modalities and its limits have been evaluated in a online study and a pilot study, with the results, that until now the ghostarm and virtual physiotherapist are the most liked guidances. A user study is planned for the future.
  • Thumbnail Image
    ItemOpen Access
    Erweiterung und Evaluation einer lupenbasierten Technik zur Exploration von Textsammlungen
    (2017) Assenov, Ivan
    In recent years there has been a sharp increase in the amount of text publicly accessible in digital form. The primary cause for this is widespread access to the Internet, the popularity of e-mail and social networking websites and collaborative efforts to preserve and share knowledge. These developments have inspired the creation of a wide variety of information visualization techniques that focus on large-scale text data and facilitate its exploration and analysis. One popular approach represents individual documents as glyphs on a 2D surface, with pairwise distances corresponding to semantic similarities. The metaphor of a moveable lens that summarizes the contents of texts underneath it has been proposed as a method of interaction targeted at free exploration tasks. The main goal of this master’s thesis project is to extend the basic technique by adding labels to the visualization that guide its users towards regions of interest more quickly without negatively impacting the lens’ usefulness. Also, an automatic framework that determines the tool’s effectiveness under different parameter settings is developed. Finally, the proposed improvements and the overall technique are evaluated by means of a think-aloud user study.
  • Thumbnail Image
    ItemOpen Access
    Progressive sparse coding for in situ volume visualization
    (2017) Berian, Gratian
    Nowadays High-Performance Computing (HPC) suffer from an ever-growing gap between computational power, I/O bandwidth and storage capacity. Typical runs of HPC simulations produce Terabytes of data every day. This poses a serious problem when it comes to storing and manipulating such high amount of data. In this thesis I will present a method for compressing time-dependent volume data using an overcomplete dictionary learned from the input data. The proposed method comprises of two steps. In the first step the dictionary is learned over a number of training examples extracted from the volume that we want to compress. This process is an iterative one and at each step the dictionary is updated to better sparsely represent the training data. The second step expresses each block of the volume as a sparse linear combination of the dictionary atoms that were trained over that volume. In order to establish the performance of the proposed method different aspects were tested such as: training speed vs sparsifying speed, compression ratio vs reconstruction error, dictionary reusabilty for multiple time steps and how does a dictionary perform when it is used on a different volume than the one it was trained on. Finally we compare the quality of the reconstructed volume to the original volume and other lossy compression techniques in order to have a visual understanding about the quality of the reconstruction.
  • Thumbnail Image
    ItemOpen Access
    Stationary vehicle classification based on scene understanding
    (2024) Wang, Weitian
    Navigating through dense traffic situations like merging onto highways and making unprotected left turns remains a challenge for the existing autonomous driving system. Classifying vehicles into parked, stopped, and moving vehicles can benefit the decision-making system in this case because they play different roles during the vehicle-to-vehicle negotiation process. Existing works in vehicle classification focused on trivial cases and used methods that are not generalized enough. To fill this gap, after analyzing this problem and summarizing the necessary information needed for this problem, we propose a multi-modal model that can leverage information from lidar, radar, camera, and high-definition maps. To meet the complexity of our task and the needs of our model, we collect the dataset in real driving scenario and then preprocess and label it. By utilizing a pretrained vision encoder for fine-grained visual feature extraction and vision foundation model (CLIP) for scene understanding, our model achieves a 97.63% test accuracy on our dataset. Through visualization methods, experiments, and quantitative analyses, we investigate the effectiveness and importance of different encoders used in our model. We interpret and explain the successes and failures of our model to give a better understanding of how different latent features contribute to the final result. In the end, the limitations of our model and potential improvements are discussed.
  • Thumbnail Image
    ItemOpen Access
    Extending parking assistance for automative user interfaces
    (2014) Georoceanu, Radu
    Nowadays the trend in the automotive industry is to integrate systems that go beyond the scope of just maneuvering the car. Navigation, communication, and entertainment functions have become usual in most cars. The multitude of sensors present in vehicles today can be used to collect information that can be shared with other drivers in order to make the roads safer and cleaner. A more troubling issue that affects drivers is the search for free parking spots, because of the time waste, fuel consumtion and effort. There are already solutions available that try to help drivers diminish these problems, like crowdsourcing smartphone apps, but they are still far away from being a reliable solution. The overall goal of this thesis is to find new ways of providing parking information to drivers. This information is collected from vehicles which are equipped with latest sensoric hardware capable of detecting parking spaces while driving and distribute these information to the cloud, sharing it with other drivers using smartphones or vehicle's integrated displays. Though the idea is simple, there are many challanges that need to be addressed. The thesis will also look into ways of improving parking surveillance for vehicles to make them less susceptible to vandalism and thefts, by using latest vehicle-integrated video camera systems. A study will be made to see what information drivers want to have related to parking and how this information can be displayed to them. Further, a cloud based-implementation of such a system will be presented in detail and an evaluation will be made to see how the system behaves in the real world.
  • Thumbnail Image
    ItemOpen Access
    Leveraging large language models for latent intention recognition and next action prediction
    (2024) Ahmed, Mohamed
    Autonomous agents that operate within graphical user interfaces (GUIs) have a significant potential to improve user experience. To achieve this, such agents must be customized and proactive. Understanding user intentions through their interactions and engagements with GUIs enables these agents to better fulfill user needs. This work introduces a novel LLM-based framework, Mistral-Intention, that accurately recognizes latent user intentions from their interactions. A key innovation is the integration of a sub-goal generation step, using prompt engineering to decompose user tasks into actionable steps, enhancing the model's interpretative capabilities and extendability. Furthermore, the incorporation of a keyword extraction-based loss significantly refines the model's focus on critical information of user actions such as typed values, ensuring comprehensive and relevant intention recognition. We evaluate Mistral-Intention using a range of metrics, including manual metrics and automatic methods based on GPT-4o, against a modified version of the state-of-the-art task automation framework, namely SYNAPSE. Results from extensive testing on the MIND2WEB and MoTIF datasets highlight Mistral-Intention's superior performance in intention recognition across various GUI environments. Furthermore, we implement an LLM-based computer agent capable of predicting the user's next action. We have addressed the challenges faced while developing such agents, such as the limited context window, and understanding the current GUI environment. Our LLM-based agent exhibits an improvement of 15.30% in the element accuracy and 13.20% in operation F1 over the previous state-of-the-art method in MindAct on MIND2WEB. Our work not only pushes the boundaries of computational HCI but also opens new pathways for developing more intuitive and effective user-center interaction solutions.
  • Thumbnail Image
    ItemOpen Access
    Tauglichkeit von Augmented-Reality-Brillen für die visuelle Analyse in Produktion und Fertigung
    (2017) Reinhardt, Jan
    Die Microsoft HoloLens ist die erste Augmented-Reality-Brille die autark arbeitet. Durch diese Kombination ergeben sich eine große Anzahl an Einsatzmöglichkeiten die zuvor nicht möglich waren. Das Ziel dieser Masterarbeit ist es herauszufinden, ob die Microsoft HoloLens für die visuelle Analyse von Produktions- und Fertigungsanlagen geeignet ist. Dazu werden im ersten Schritt die technischen Möglichkeiten der Microsoft HoloLens analysiert. Im zweiten Schritt wird auf Grundlage dieser Möglichkeiten ein Konzept entwickelt, das zeigt wie Augmented-Reality in Produktion und Fertigung eingesetzt werden kann. Dazu wird ein Fabriklayout-Simulators auf die HoloLens portiert. Neben der direkten Portierung sollen auch Anpassungen durchgeführt werden, um die vollen Möglichkeiten der Augmented-Reality-Brille zu nutzen. Ein besonderes Augenmerk liegt dabei auf der Interaktion und der Visualisierung. Dieses Konzept wird anschließend in Form eines Prototyps für die Microsoft HoloLens implementiert und auf Basis eines Use-Cases evaluiert.
  • Thumbnail Image
    ItemOpen Access
    Depth-driven variational methods for stereo reconstruction
    (2014) Maurer, Daniel
    Stereo reconstruction belongs to the fundamental problems in computer vision, with the aim of reconstructing the depth of a static scene. In order to solve this problem the corresponding pixels in both views must be found. A common technique is to minimize an energy (cost) function. Therefore, most methods use a parameterization in form of a displacement information (disparity). In contrast, this thesis uses, extends and examines a depth parameterization. (i) First a basic depth-driven variational method is developed based on a recently presented method of Basha et al. [2]. (ii) After that, several possible extensions are presented, in order to improve the developed method. These extensions include advanced smoothness terms that incorporate image information and enable an anisotropic smoothing behavior. Further advanced data terms are considered, which use modified constraints to allow a more accurate estimation in different situations. (iii) Finally, all extensions are compared with each other and with a disparity-driven counterpart.
  • Thumbnail Image
    ItemOpen Access
    Correlating facial expressions and contextual data for mood prediction using mobile devices
    (2017) Reutter, Robin
    Facial recognition can nowadays be achieved by any casual smartphone with a camera. Sophisticated systems and methods allow extracting information from facial data such as connected emotions depending on a person’s current expressions. In a mobile setting, information about emotions can be constantly grasped from a person using the smartphone front camera and annotated with various types of contextual data. This master thesis introduces OpenFaceAndroid - an Android application based on the existing facial analysis frameworks OpenFace and OpenFace++. The system allows to gather and process facial expression information as well as contextual data in real-time on a smartphone using the front camera device and various sensors. The output is a prediction of seven different emotions which is - compared to pure facial data extraction - improved through annotation with context. In two conducted studies first off data from several participants is collected, assessed for their usefulness in terms of this master thesis and afterward utilized to learn classifier models taking live emotion values and context information as training data. Subsequently, these models are evaluated for their accuracy in general emotion prediction and noticing affective mood changes - supported by findings of participant interviews. In conclusion, possible improvements and general limitations of this work are discussed as well as suggestions for future work proposed.