Universität Stuttgart
Permanent URI for this communityhttps://elib.uni-stuttgart.de/handle/11682/1
Browse
530 results
Search Results
Item Open Access Challenges of computational social science analysis with NLP methods(2022) Dayanik, Erenay; Padó, Sebastian (Prof. Dr.)Computational Social Science (CSS) is an emerging research area at the intersection of social science and computer science, where problems of societal relevance can be addressed by novel computational methods. With the recent advances in machine learning and natural language processing as well as the availability of textual data, CSS has opened up to new possibilities, but also methodological challenges. In this thesis, we present a line of work on developing methods and addressing challenges in terms of data annotation and modeling for computational political science and social media analysis, two highly popular and active research areas within CSS. In the first part of the thesis, we focus on a use case from computational political science, namely Discourse Network Analysis (DNA), a framework that aims at analyzing the structures behind complex societal discussions. We investigate how this style of analysis, which is traditionally performed manually, can be automated. We start by providing a requirement analysis outlining a roadmap to decompose the complex DNA task into several conceptually simpler sub-tasks. Then, we introduce NLP models with various configurations to automate two of the sub-tasks given by the requirement analysis, namely claim detection and classification, based on different neural network architectures ranging from unidirectional LSTMs to Transformer based architectures. In the second part of the thesis, we shift our focus to fairness, a central concern in CSS. Our goal in this part of the thesis is to analyze and improve the performances of NLP models used in CSS in terms of fairness and robustness while maintaining their overall performance. With that in mind, we first analyze the above-mentioned claim detection and classification models and propose techniques to improve model fairness and overall performance. After that, we broaden our focus to social media analysis, another highly active subdomain of CSS. Here, we study text classification of the correlated attributes, which pose an important but often overlooked challenge to model fairness. Our last contribution is to discuss the limitations of the current statistical methods applied for bias identification; to propose a multivariate regression based approach; and to show that, through experiments conducted on social media data, it can be used as a complementary method for bias identification and analysis tasks. Overall, our work takes a step towards increasing the understanding of challenges of computational social science. We hope that both political scientists and NLP scholars can make use of the insights from this thesis in their research.Item Open Access Integration von Data Mining und Online Analytical Processing : eine Analyse von Datenschemata, Systemarchitekturen und Optimierungsstrategien(2003) Schwarz, Holger; Mitschang, Bernhard (Prof. Dr.-Ing. habil.)Die technischen Möglichkeiten, Daten zu erfassen und dauerhaft zu speichern, sind heute so ausgereift, dass insbesondere in Unternehmen und anderen Organisationen große Datenbestände verfügbar sind. In diesen Datenbeständen, häufig als Data Warehouse bezeichnet, sind alle relevanten Informationen zu den Organisationen selbst, den in ihnen ablaufenden Prozessen sowie deren Interaktion mit anderen Organisationen enthalten. Vielfach stellt die zielgerichtete Analyse der Datenbestände den entscheidenden Erfolgsfaktor für Organisationen dar. Zur Analyse der Daten in einem Data Warehouse sind verschiedenste Ansätze verfügbar und erprobt. Zwei der wichtigsten Vertreter sind das Online Analytical Processing (OLAP) und das Data Mining. Beide setzen unterschiedliche Schwerpunkte und werden bisher in der Regel weitgehend isoliert eingesetzt. In dieser Arbeit wird zunächst gezeigt, dass eine umfassende Analyse der Datenbestände in einem Data Warehouse nur durch den integrierten Einsatz beider Analyseansätze erzielt werden kann. Einzelne Fragestellungen, die sich aus diesem Integrationsbedarf ergeben werden ausführlich diskutiert. Zu den betrachteten Fragestellungen gehört die geeignete Modellierung der Daten in einem Data Warehouse. Bei der Bewertung gängiger Modellierungsansätze fließen insbesondere die Anforderungen ein, die sich durch den beschriebenen Integrationsansatz ergeben. Als Ergebnis wird ein konzeptuelles Datenmodell vorgestellt, das Informationen in einer Weise strukturiert, die für OLAP und Data Mining gleichermaßen geeignet ist. Im Bereich der logischen Modellierung werden schließlich diejenigen Schematypen identifiziert, die die Integration der Analyseansätze geeignet unterstützen. Im nächsten Schritt sind die für Data Mining und OLAP unterschiedlichen Systemarchitekturen Gegenstand dieser Arbeit. Deren umfassende Diskussion ergibt eine Reihe von Defiziten. Dies führt schließlich zu einer erweiterten Systemarchitektur, die die Schwachstellen beseitigt und die angestrebte Integration geeignet unterstützt. Die erweiterte Systemarchitektur weist eine Komponente zur anwendungsunabhängigen Optimierung unterschiedlicher Analyseanwendungen auf. Ein dritter Schwerpunkt dieser Arbeit besteht in der Identifikation geeigneter Optimierungsansätze hierfür. Die Bewertung der Ansätze wird einerseits qualitativ durchgeführt. Andererseits wird das Optimierungspotenzial der einzelnen Ansätze auch auf der Grundlage umfangreicher Messreihen gezeigt.Item Open Access A design space for pervasive advertising on public displays(2013) Alt, Florian; Schmidt, Albrecht (Prof. Dr.)Today, people living in cities see up to 5000 ads per day and many of them are presented on public displays. More and more of these public displays are networked and equipped with various types of sensors, making them part of a global infrastructure that is currently emerging. Such networked and interactive public displays provide the opportunity to create a benefit for society in the form of immersive experiences and relevant content. In this way, they can overcome the display blindness that evolved among passersby over the years. We see two main reasons that prevent this vision from coming true: first, public displays are stuck with traditional advertising as the driving business model, making it difficult for novel, interactive applications to enter the scene. Second, no common ground exists for researchers or advertisers that outline important challenges. The provider view and audience view need to be addressed to make open, interactive display networks, successful. The main contribution made by this thesis is presenting a design space for advertising on public displays that identifies important challenges -- mainly from a human-computer interaction perspective. Solutions to these core challenges are presented and evaluated, using empirical methods commonly applied in HCI. First, we look at challenges that arise from the shared use of display space. We conducted an observational study of traditional public notice areas that allowed us to identify different stakeholders, to understand their needs and motivations, to unveil current practices used to exercise control over the display, and to understand the interplay between space, stakeholders, and content. We present a set of design implications for open public display networks that we applied when implementing and evaluating a digital public notice area. Second, we tackle the challenge of making the user interact by taking a closer look at attracting attention, communicating interactivity, and enticing interaction. Attracting attention is crucial for any further action to happen. We present an approach that exploits gaze as a powerful input modality. By adapting content based on gaze, we are able to show a significant increase in attention and an effect on the user's attitude. In order to communicate interactivity, we show that the mirror representation of the user is a powerful interactivity cue. Finally, in order to entice interaction, we show that the user needs to be motivated to interact and to understand how interaction works. Findings from our experiments reveal direct touch and the mobile phone as suitable interaction technologies. In addition, these findings suggest that relevance of content, privacy, and security have a strong influence on user motivation. Third, this thesis makes a set of contributions towards understanding audience behavior, which is particularly important for advertisers in order to choose appropriate content and to select suitable locations for future advertising displays. Our findings provide an in-depth understanding of the honeypot effect as a powerful interactivity cue. Furthermore, we identify a number of interesting effects (e.g., the landing effect) and explain how developers could design for them. We envision the results of this thesis to provide a basis for future research and for practitioners to shape future advertisements on public displays in a positive way.Item Open Access Analyzing code corpora to improve the correctness and reliability of programs(2021) Patra, Jibesh; Pradel, Michael (Prof. Dr.)Bugs in software are commonplace, challenging, and expensive to deal with. One widely used direction is to use program analyses and reason about software to detect bugs in them. In recent years, the growth of areas like web application development and data analysis has produced large amounts of publicly available source code corpora, primarily written in dynamically typed languages, such as Python and JavaScript. It is challenging to reason about programs written in such languages because of the presence of dynamic features and the lack of statically declared types. This dissertation argues that, to build software developer tools for detecting and understanding bugs, it is worthwhile to analyze code corpora, which can uncover code idioms, runtime information, and natural language constructs such as comments. The dissertation is divided into three corpus-based approaches that support our argument. In the first part, we present static analyses over code corpora to generate new programs, to perform mutations on existing programs, and to generate data for effective training of neural models. We provide empirical evidence that the static analyses can scale to thousands of files and the trained models are useful in finding bugs in code. The second part of this dissertation presents dynamic analyses over code corpora. Our evaluations show that the analyses are effective in uncovering unexpected behaviors when multiple JavaScript libraries are included together and to generate data for training bug-finding neural models. Finally, we show that a corpus-based analysis can be useful for input reduction, which can help developers to find a smaller subset of an input that still triggers the required behavior. We envision that the current dissertation motivates future endeavors in corpus-based analysis to alleviate some of the challenges faced while ensuring the reliability and correctness of software. One direction is to combine data obtained by static and dynamic analyses over code corpora for training. Another direction is to use meta-learning approaches, where a model is trained using data extracted from the code corpora of one language and used for another language.Item Open Access Scalable computer network emulation using node virtualization and resource monitoring(2011) Maier, Steffen Dirk; Rothermel, Kurt (Prof. Dr. rer. nat. Dr. h. c.)Ongoing development of computer network technology requires new communication protocols on all layers of the protocol stack to adapt to and to exploit technology specifics. The performance of new protocol implementations has to be evaluated before deployment. Computer network emulation enables the execution of real unmodified protocol implementations within a configurable synthetic environment. Since network properties are reproduced synthetically, emulation supports reproducible measurement results for wired and wireless networks. Meaningful evaluation scenarios typically involve a large number of communicating nodes. Reproducing the network properties of the medium access control layer can be accomplished efficiently on cheap common off the shelf computers and allows to evaluate network protocols, transport protocols, and applications. However, meaningful emulation scenario sizes often require more nodes than affordable computers. To scale the number of nodes in an emulation scenario beyond the available computers, we discuss approaches to virtualization and operating system partitioning. Focusing on the latter, we argue for virtual protocol stacks, which provide an extremely lightweight node virtualization enabling the execution of multiple instances of software to be evaluated on each physical computer. To connect virtual nodes on the same and on different computers, we design and implement a highly efficient software communication switch. A centralized emulation control component distributes dynamic network property updates which result from node mobility for instance. To handle the large number of nodes and thus increased updates, we propose a hierarchical control where the central component delegates updates to sub-components distributed over the computers of an emulation system. Extensive evaluations show the scalability of our virtualized network emulation system. Virtual nodes executed on the same computer share its limited resources. Hosting too many virtual nodes on the same computer may lead to resource contention. This can cause unrealistic measurement results and is thus undesirable. Discussing different approaches to handle resource contention, we argue for detection and recovery. We define quality criteria that allow the detection of resource contention. In order to observe those quality criteria during emulation experiments, we propose a highly lightweight monitoring approach. Our monitoring is based on instrumenting an operating system kernel and observing basic resource scheduling events. This enables the detection of even peak resource usage within a split second. Thorough evaluations demonstrate the effectiveness of quality criteria and monitoring as well as the negligible overhead of our monitoring approach.Item Open Access Stochastic neural networks : components, analysis, limitations(2022) Neugebauer, Florian; Polian, Ilia (Prof. Dr.)Stochastic computing (SC) promises an area and power-efficient alternative to conventional binary implementations of many important arithmetic functions. SC achieves this by employing a stream-based number format called Stochastic numbers (SNs), which enables bit-sequential computations, in contrast to conventional binary computations that are performed on entire words at once. An SN encodes a value probabilistically with equal weight for every bit in the stream. This encoding results in approximate computations, causing a trade-off between power consumption, area and computation accuracy. The prime example for efficient computation in SC is multiplication, which can be performed with only a single gate. SC is therefore an attractive alternative to conventional binary implementations in applications that contain a large number of basic arithmetic operations and are able to tolerate the approximate nature of SC. The most widely considered class of applications in this regard is neural networks (NNs), with convolutional neural networks (CNNs) as the prime target for SC. In recent years, steady advances have been made in the implementation of SC-based CNNs (SCNNs). At the same time however, a number of challenges have been identified as well: SCNNs need to handle large amounts of data, which has to be converted from conventional binary format into SNs. This conversion is hardware intensive and takes up a significant portion of a stochastic circuit's area, especially if the SNs have to be generated independently of each other. Furthermore, some commonly used functions in CNNs, such as max-pooling, have no exact corresponding SC implementation, which reduces the accuracy of SCNNs. The first part of this work proposes solutions to these challenges by introducing new stochastic components: A new stochastic number generator (SNG) that is able to generate a large number of SNs at the same time and a stochastic maximum circuit that enables an accurate implementation of max-pooling operations in SCNNs. In addition, the first part of this work presents a detailed investigation of the behaviour of an SCNN and its components under timing errors. The error tolerance of SC is often quoted as one of its advantages, stemming from the fact that any single bit of an SN contributes only very little to its value. In contrast, bits in conventional binary formats have different weights and can contribute as much as 50\% of a number's value. SC is therefore a candidate for extreme low-power systems, as it could potentially tolerate timing errors that appear in such environments. While the error tolerance of SC image processing systems has been demonstrated before, a detailed investigation into SCNNs in this regard has been missing so far. It will be shown that SC is not error tolerant in general, but rather that SC components behave differently even if they implement the same function, and that error tolerance of an SC system further depends on the error model. In the second part of this work, a theoretical analysis into the accuracy and limitations of SC systems is presented. An existing framework to analyse and manage the accuracy of combinational stochastic circuits is extended to cover sequential circuits. This framework enables a designer to predict the effect of small design changes on the accuracy of a circuit and determine important parameters such as SN length without extensive simulations. It will further be shown that the functions that are possible to implement in SC are limited. Due to the probabilistic nature of SC, some arithmetic functions suffer from a small bias when implemented as a stochastic circuit, including the max-pooling function in SCNNs.Item Open Access Development and application of PICLas for combined optic-/plume-simulation of ion-propulsion systems(2019) Binder, Tilman; Fasoulas, Stefanos (Prof. Dr.-Ing.)Electric propulsion systems are an efficient option for altitude/attitude control and orbit transfers of spacecraft. One example is the gridded ion thruster which ionizes the propellant and accelerates the ions of the generated plasma by a high-voltage grid system. This work deals with the numerical simulation of the plasma flow starting near the grid system in the ionization chamber and leaving the thruster with high velocity. These simulations give direct insight into the modeled, physical interrelationships and can be used to investigate questions arising in the industrial development process of ion propulsion systems. The required simulation method is challenging due to the high degree of flow rarefaction and the plasma state itself, including freely moving ions and electrons. Applicable simulation methods belong to a particle-based, gas-kinetic approach, such as Particle-In-Cell (PIC) for the simulation of electromagnetic interaction and the Direct Simulation Monte Carlo (DSMC) for inter-particle collisions. The effects resulting from the finite size of a real system can only be investigated by simulating the complete, three-dimensional thruster geometry which requires a large and complex simulation domain. Acceptable simulation times are realized by expanding and using the framework of the coupled PIC-DSMC code PICLas in combination with high performance computing systems.Item Open Access Über die Lösung der Navier-Stokes-Gleichungen mit Hilfe der Moore-Penrose-Inversen des Laplace-Operators im Vektorraum der Polynomkoeffizienten(2024) Große-Wöhrmann, Bärbel; Resch, Michael (Prof. Dr.-Ing.)Die bekannten numerischen Standard-Verfahren zur Lösung partieller Differentialgleichungen basieren auf einer räumlichen Diskretisierung des Berechnungsgebiets. Ihre Performance und Skalierbarkeit auf modernen massiv-parallelen Höchstleistungsrechnern ist von der Verfügbarkeit effizienter numerischer Verfahren zur Lösung linearer Gleichungssysteme abhängig. Angesichts grundlegender Herausforderungen erscheint die Entwicklung neuer Lösungsansätze sinnvoll. Ich stelle in dieser Arbeit einen Polynomansatz zur Lösung partieller Differentialgleichungen vor, der nicht auf einer räumlichen Diskretisierung beruht und mit Hilfe der Moore-Penrose-Inversen des Laplace-Operators die Entkopplung der Navier-Stokes-Gleichungen ermöglicht. Dabei ist der Grad der Polynome nicht grundsätzlich beschränkt, so dass eine hohe räumliche Auflösung erreicht werden kann.Item Open Access Investigating dynamics by multilevel phase space discretization(2006) Fundinger, Danny Georg; Levi, Paul (Prof. Dr.)The subject of the thesis is the numerical investigation of dynamical systems. The aim is to provide approaches for the localization of several topological structures which are of vital importance for the global analysis of dynamical systems, namely, periodic orbits, the chain recurrent set, repellers, attractors and their domains of attraction as well as stable, unstable and connecting manifolds. The techniques introduced do not require any a priori knowledge about a system, and are also not restricted by the stability of the solution. Furthermore, they can generally be applied to a wide range of dynamical systems. Two theoretical concepts are considered to be at the center of the research - symbolic analysis and the RIM method. The underlying basic approach for both of them is multilevel phase space discretization. This means that a part of the phase space, the area of investigation, is subdivided in a finite number of sets. Then, instead of each point of the phase space, only these sets are subject of further analysis. The main target of every method proposed is to find those sets which contain parts of the solution and subdivide them into smaller parts until a desired accuracy is reached. In case of symbolic analysis, a directed graph is constructed which represents the structure of the state space for the investigated dynamical system. This graph is called the symbolic image of the focused system and can be seen as an approximation of the system flow. The theoretical background regarding the symbolic image graph as well as the constructive methods applied on it were already described in a series of works by G. Osipenko. In this work, strategies are introduced for a practical application. This requires the extension of the theoretical concepts and the development of appropriate algorithms and data structures. In practice, it turned out that these aspects are essential cornerstones for the usability of the discussed methods. Also some sophisticated tunings of the basic methods are proposed in order to extent the field of practical investigation. Although symbolic analysis can be seen as the main stimulation of this work, the investigation was not limited to it. Indeed, several shortcomings regarding the solution of some problems can be observed if the method is applied in practice. This led to the development of the RIM method. The core intention of the method is to solve the root finding problem. The standard approach toward this task is the application of an iteration scheme based on the Newton method. However, it has shown that such Newton schemes have several structural disadvantages which are especially crucial in the context of the fields of investigation which are relevant to this work. The RIM method proposes an alternative approach which does not require the application of any Newton-like method. Numerical case studies revealed that in several nontrivial scenarios the RIM method provides better results than both, symbolic analysis as well as Newton-based methods. Two applications of the RIM method for the investigation of dynamical systems are provided. One of them is the detection of periodic points. The other is the computation of stable manifolds. The proposed methods contribute not only to the direct investigation and simulation of specific dynamical processes but also to the research in the field of dynamical system theory in general. This is due to the fact that progress in theory depends to a large extent on the observation and investigation of phenomenons. These phenomenons can often only be revealed, analyzed and verified by numerical experiments. The presented numerical case studies give some concrete examples for the application of the methods. Hereby, the dynamical models are taken from different fields of scientific research, like geography, biology, meteorology, or physics.Item Open Access Cognition-aware systems to support information intake and learning(2016) Dingler, Tilman; Schmidt, Albrecht (Prof. Dr.)Knowledge is created at an ever-increasing pace putting us under constant pressure to consume and acquire new information. Information gain and learning, however, require time and mental resources. While the proliferation of ubiquitous computing devices, such as smartphones, enables us to consume information anytime and anywhere, technologies are often disruptive rather than sensitive to the current user context. While people exhibit different levels of concentration and cognitive capacity throughout the day, applications rarely take these performance variations into account and often overburden their users with information or fail to stimulate. This work investigates how technology can be used to help people effectively deal with information intake and learning tasks through cognitive context-awareness. By harvesting sensor and usage data from mobile devices, we obtain people's levels of attentiveness, receptiveness, and cognitive performance. We subsequently use this cognition-awareness in applications to help users process information more effectively. Through a series of lab studies, online surveys, and field experiments we follow six research questions to investigate how to build cognition-aware systems. Awareness of user's variations in levels of attention, receptiveness, and cognitive performance allows systems to trigger appropriate content suggestions, manage user interruptions, and adapt User Interfaces in real-time to match tasks to the user's cognitive capacities. The tools, insights, and concepts described in this book allow researchers and application designers to build systems with an awareness of momentary user states and general circadian rhythms of alertness and cognitive performance.