Universität Stuttgart
Permanent URI for this communityhttps://elib.uni-stuttgart.de/handle/11682/1
Browse
352 results
Search Results
Item Open Access Challenges of computational social science analysis with NLP methods(2022) Dayanik, Erenay; Padó, Sebastian (Prof. Dr.)Computational Social Science (CSS) is an emerging research area at the intersection of social science and computer science, where problems of societal relevance can be addressed by novel computational methods. With the recent advances in machine learning and natural language processing as well as the availability of textual data, CSS has opened up to new possibilities, but also methodological challenges. In this thesis, we present a line of work on developing methods and addressing challenges in terms of data annotation and modeling for computational political science and social media analysis, two highly popular and active research areas within CSS. In the first part of the thesis, we focus on a use case from computational political science, namely Discourse Network Analysis (DNA), a framework that aims at analyzing the structures behind complex societal discussions. We investigate how this style of analysis, which is traditionally performed manually, can be automated. We start by providing a requirement analysis outlining a roadmap to decompose the complex DNA task into several conceptually simpler sub-tasks. Then, we introduce NLP models with various configurations to automate two of the sub-tasks given by the requirement analysis, namely claim detection and classification, based on different neural network architectures ranging from unidirectional LSTMs to Transformer based architectures. In the second part of the thesis, we shift our focus to fairness, a central concern in CSS. Our goal in this part of the thesis is to analyze and improve the performances of NLP models used in CSS in terms of fairness and robustness while maintaining their overall performance. With that in mind, we first analyze the above-mentioned claim detection and classification models and propose techniques to improve model fairness and overall performance. After that, we broaden our focus to social media analysis, another highly active subdomain of CSS. Here, we study text classification of the correlated attributes, which pose an important but often overlooked challenge to model fairness. Our last contribution is to discuss the limitations of the current statistical methods applied for bias identification; to propose a multivariate regression based approach; and to show that, through experiments conducted on social media data, it can be used as a complementary method for bias identification and analysis tasks. Overall, our work takes a step towards increasing the understanding of challenges of computational social science. We hope that both political scientists and NLP scholars can make use of the insights from this thesis in their research.Item Open Access Analyzing code corpora to improve the correctness and reliability of programs(2021) Patra, Jibesh; Pradel, Michael (Prof. Dr.)Bugs in software are commonplace, challenging, and expensive to deal with. One widely used direction is to use program analyses and reason about software to detect bugs in them. In recent years, the growth of areas like web application development and data analysis has produced large amounts of publicly available source code corpora, primarily written in dynamically typed languages, such as Python and JavaScript. It is challenging to reason about programs written in such languages because of the presence of dynamic features and the lack of statically declared types. This dissertation argues that, to build software developer tools for detecting and understanding bugs, it is worthwhile to analyze code corpora, which can uncover code idioms, runtime information, and natural language constructs such as comments. The dissertation is divided into three corpus-based approaches that support our argument. In the first part, we present static analyses over code corpora to generate new programs, to perform mutations on existing programs, and to generate data for effective training of neural models. We provide empirical evidence that the static analyses can scale to thousands of files and the trained models are useful in finding bugs in code. The second part of this dissertation presents dynamic analyses over code corpora. Our evaluations show that the analyses are effective in uncovering unexpected behaviors when multiple JavaScript libraries are included together and to generate data for training bug-finding neural models. Finally, we show that a corpus-based analysis can be useful for input reduction, which can help developers to find a smaller subset of an input that still triggers the required behavior. We envision that the current dissertation motivates future endeavors in corpus-based analysis to alleviate some of the challenges faced while ensuring the reliability and correctness of software. One direction is to combine data obtained by static and dynamic analyses over code corpora for training. Another direction is to use meta-learning approaches, where a model is trained using data extracted from the code corpora of one language and used for another language.Item Open Access Stochastic neural networks : components, analysis, limitations(2022) Neugebauer, Florian; Polian, Ilia (Prof. Dr.)Stochastic computing (SC) promises an area and power-efficient alternative to conventional binary implementations of many important arithmetic functions. SC achieves this by employing a stream-based number format called Stochastic numbers (SNs), which enables bit-sequential computations, in contrast to conventional binary computations that are performed on entire words at once. An SN encodes a value probabilistically with equal weight for every bit in the stream. This encoding results in approximate computations, causing a trade-off between power consumption, area and computation accuracy. The prime example for efficient computation in SC is multiplication, which can be performed with only a single gate. SC is therefore an attractive alternative to conventional binary implementations in applications that contain a large number of basic arithmetic operations and are able to tolerate the approximate nature of SC. The most widely considered class of applications in this regard is neural networks (NNs), with convolutional neural networks (CNNs) as the prime target for SC. In recent years, steady advances have been made in the implementation of SC-based CNNs (SCNNs). At the same time however, a number of challenges have been identified as well: SCNNs need to handle large amounts of data, which has to be converted from conventional binary format into SNs. This conversion is hardware intensive and takes up a significant portion of a stochastic circuit's area, especially if the SNs have to be generated independently of each other. Furthermore, some commonly used functions in CNNs, such as max-pooling, have no exact corresponding SC implementation, which reduces the accuracy of SCNNs. The first part of this work proposes solutions to these challenges by introducing new stochastic components: A new stochastic number generator (SNG) that is able to generate a large number of SNs at the same time and a stochastic maximum circuit that enables an accurate implementation of max-pooling operations in SCNNs. In addition, the first part of this work presents a detailed investigation of the behaviour of an SCNN and its components under timing errors. The error tolerance of SC is often quoted as one of its advantages, stemming from the fact that any single bit of an SN contributes only very little to its value. In contrast, bits in conventional binary formats have different weights and can contribute as much as 50\% of a number's value. SC is therefore a candidate for extreme low-power systems, as it could potentially tolerate timing errors that appear in such environments. While the error tolerance of SC image processing systems has been demonstrated before, a detailed investigation into SCNNs in this regard has been missing so far. It will be shown that SC is not error tolerant in general, but rather that SC components behave differently even if they implement the same function, and that error tolerance of an SC system further depends on the error model. In the second part of this work, a theoretical analysis into the accuracy and limitations of SC systems is presented. An existing framework to analyse and manage the accuracy of combinational stochastic circuits is extended to cover sequential circuits. This framework enables a designer to predict the effect of small design changes on the accuracy of a circuit and determine important parameters such as SN length without extensive simulations. It will further be shown that the functions that are possible to implement in SC are limited. Due to the probabilistic nature of SC, some arithmetic functions suffer from a small bias when implemented as a stochastic circuit, including the max-pooling function in SCNNs.Item Open Access Scalable traffic engineering heuristics for time-triggered communication in real-time networks(2026) Geppert, Heiko; Rothermel, Kurt (Prof. Dr. rer. nat. Dr. h. c.)Distributed safety-critical cyber-physical systems require real-time behavior. This means they must respond not just quickly, but in time, to new situations considering both, the task processing and network communication time. From a networking perspective, meticulous, time-driven traffic planning performed at the frame level is necessary to guarantee low end-to-end delay bounds and low latency. This involves carefully planning transmission operations along each time-critical frame's network path are carefully planned, including precise timing, to limit or even eliminate interference from cross-traffic and ensure timely delivery. Since modern real-time systems can consist of hundreds or thousands of devices - for example, large manufacturing plants or continental-sized power grids - the traffic planning must be highly scalable. Although there are many traffic planning approaches in the literature, there is a lack of very fast heuristics that can handle very large stream sets and networks quickly. This thesis investigates traffic planning heuristics and optimization techniques, focusing on different aspects of the traffic planning domain. The traffic planning consists of novel methods for conflict-graph-based scheduling and new heuristics for very large instances of traffic planning problem. The optimizations include multicast partitioning, which combines the benefits of multicast and unicast traffic plans, and load-balanced stream placement, which generates traffic plans that can accommodate additional streams joining the system later. We created prototype implementations and analyzed their performance in solving the traffic planning problem. Our traffic plans yielded a higher accumulated network throughput or admitted more streams while maintaining computation times ranging from sub-seconds to minutes, even for extremely large-scale problem instances. The traffic planning methods and optimization techniques presented in this thesis can be applied to modern real-time networking technologies, such as Time-Sensitive Networking and TTEthernet.Item Open Access Development and application of PICLas for combined optic-/plume-simulation of ion-propulsion systems(2019) Binder, Tilman; Fasoulas, Stefanos (Prof. Dr.-Ing.)Electric propulsion systems are an efficient option for altitude/attitude control and orbit transfers of spacecraft. One example is the gridded ion thruster which ionizes the propellant and accelerates the ions of the generated plasma by a high-voltage grid system. This work deals with the numerical simulation of the plasma flow starting near the grid system in the ionization chamber and leaving the thruster with high velocity. These simulations give direct insight into the modeled, physical interrelationships and can be used to investigate questions arising in the industrial development process of ion propulsion systems. The required simulation method is challenging due to the high degree of flow rarefaction and the plasma state itself, including freely moving ions and electrons. Applicable simulation methods belong to a particle-based, gas-kinetic approach, such as Particle-In-Cell (PIC) for the simulation of electromagnetic interaction and the Direct Simulation Monte Carlo (DSMC) for inter-particle collisions. The effects resulting from the finite size of a real system can only be investigated by simulating the complete, three-dimensional thruster geometry which requires a large and complex simulation domain. Acceptable simulation times are realized by expanding and using the framework of the coupled PIC-DSMC code PICLas in combination with high performance computing systems.Item Open Access Über die Lösung der Navier-Stokes-Gleichungen mit Hilfe der Moore-Penrose-Inversen des Laplace-Operators im Vektorraum der Polynomkoeffizienten(2024) Große-Wöhrmann, Bärbel; Resch, Michael (Prof. Dr.-Ing.)Die bekannten numerischen Standard-Verfahren zur Lösung partieller Differentialgleichungen basieren auf einer räumlichen Diskretisierung des Berechnungsgebiets. Ihre Performance und Skalierbarkeit auf modernen massiv-parallelen Höchstleistungsrechnern ist von der Verfügbarkeit effizienter numerischer Verfahren zur Lösung linearer Gleichungssysteme abhängig. Angesichts grundlegender Herausforderungen erscheint die Entwicklung neuer Lösungsansätze sinnvoll. Ich stelle in dieser Arbeit einen Polynomansatz zur Lösung partieller Differentialgleichungen vor, der nicht auf einer räumlichen Diskretisierung beruht und mit Hilfe der Moore-Penrose-Inversen des Laplace-Operators die Entkopplung der Navier-Stokes-Gleichungen ermöglicht. Dabei ist der Grad der Polynome nicht grundsätzlich beschränkt, so dass eine hohe räumliche Auflösung erreicht werden kann.Item Open Access Architectural refactoring to microservices : a quality-driven methodology for modernizing monolithic applications(2024) Fritzsch, Jonas; Wagner, Stefan (Prof. Dr.)Context and Problem: The microservices architectural style has revolutionized the way modern software systems are developed and operated, and has become the de facto standard for cloud-based applications. However, existing systems are often designed as monoliths, which are associated with inflexible processes, long release cycles, and an architecture incapable of leveraging the advantages of cloud environments. The adoption of microservices would require an architectural refactoring, entailing redevelopments of parts or even the entire application. Often associated with extensive manual effort, the targeted, quality-oriented, and semi-automated decomposition into a set of self-contained services remains problematic. Software architects look for resource-efficient ways to provide predictable results and guidance in this highly individual process. Objective: To systematically guide software architects and developers in modernizing their software systems, we seek to provide a holistic methodology for systematic and quality-driven migrations towards microservices. As part of it, we search solutions for the targeted and automated decomposition into services, and ways to support a quality-oriented design based on established patterns and best practices. Our work aims to provide industry-relevant methods that address the gap between academia and practice by facilitating the transfer of knowledge. Methods: In an overarching design science research process, we create a migration methodology that we implement as a web-based application. For analysis and evaluation, we apply established methods in empirical software engineering, such as case study research, surveys, and semi-structured interviews with experts. Our secondary research to summarize the current state of scientific advances relies on consecutive literature searches and rapid reviews, a lightweight method derived from systematic reviews. Contributions: Based on two primary empirical interview studies with 25 software professionals, we collected evidence on the intentions, strategies, and challenges of migrating monolithic applications to microservices, complemented by requirements for tool support and automation. Over four iterations, we reviewed 110 scientific publications on approaches for architectural refactoring and migration to microservices. To guide architects and developers in a migration process, we conceptualized a framework, along with a dedicated quality model that reflects a quality-driven migration process. Based on latest technologies and a modern user interface design, we realized our concept as a web-based application in an agile development process with early involvement of potential users. In a multi-faceted evaluation, we examined its ability to provide actionable guidance for practitioners. To this end, we conducted three surveys and one interview study with a total of 26 participants, complemented by two longitudinal case studies in an industrial context. Conclusion: We propose a holistic methodology for modernizing monolithic applications to microservices that comprises a framework and a dedicated quality model. Our contributions support architects in making informed decisions about microservices adoption, and furthermore guide them through a systematic transformation process. The evaluations showed an overall positive result in terms of effectiveness, usefulness, and usability, while both case studies demonstrated a successful application in an industrial environment. By sharing important study artifacts, we support researchers developing industry-focused methods, who can profit from our insights and experiences. Moreover, we regard our design science approach to leveraging academic research by practice as transferable to other scientific disciplines.Item Open Access Decoding strategies for syntax-based statistical machine translation(2015) Braune, Fabienne; Maletti, Andreas (Dr.)Provided with a sentence in an input language, a human translator produces a sentence in the desired target language. The advances in artificial intelligence in the 1950s led to the idea of using machines instead of humans to generate translations. Based on this idea, the field of Machine Translation (MT) was created. The first MT systems aimed to map input text into the target translation through the application of hand-crafted rules. While this approach worked well for specific language-pairs on restricted fields, it was hardly extendable to new languages and domains because of the huge amount of human effort necessary to create new translation rules. The increase of computational power enabled Statistical Machine Translation (SMT) in the late 1980s, which addressed this problem by learning translation units automatically from large text collections. Statistical machine translation can be divided into several paradigms. Early systems modeled translation between words while later work extended these to sequences of words called phrases. A common point between word and phrase-based SMT is that the translation process takes place sequentially, which is not well suited to translate between languages where words need to be reordered over (potentially) long distances. Such reorderings led to the implementation of SMT systems based on formalisms that allow to translate recursively instead of sequentially. In these systems, called syntax-based systems, the translation units are modeled with formal grammar productions and translation is performed by assembling the productions of these grammars. This thesis contributes to the field of syntax-based SMT in two ways : (i) the applicability of a new grammar formalism is tested by building the first SMT system based on the local local Multi Bottom-Up Tree Transducer (l-MBOT) (ii) new ways to integrate linguistic annotations in the translation model (instead of the grammar rules) of syntax-based systems are developed.Item Open Access A light weighted semi-automatically I/O-tuning solution for engineering applications(Stuttgart : Höchstleistungsrechenzentrum, Universität Stuttgart, 2017) Wang, Xuan; Resch, Michael M. (Prof. Dr.-Ing. Dr. h.c. Dr. h.c. Prof. E.h.)Today’s engineering applications running on high performance computing (HPC) platforms generate more and more diverse data simultaneously and require large storage systems as well as extremely high data transfer rates to store their data. To achieve high performance data transfer rate (I/O performance), computer scientists together with HPC manufacturers have developed a lot of innovative solutions. However, how to transfer the knowledge of their solutions to engineers and scientists has become one of the largest barriers. Since the engineers and scientists are experts in their own professional areas, they might not be capable of tuning their applications to the optimal level. Sometimes they might even drop down the I/O performance by mistake. The basic training courses provided by computing centers like HLRS seem to be not sufficient enough to transfer the know-how required. In order to overcome this barrier, I have developed a semi-automatically I/O-tuning solution (SAIO) for engineering applications. SAIO, a light weighted and intelligent framework, is designed to be compatible with as many engineering applications as possible, scalable with large engineering applications, usable for engineers and scientists with little knowledge of parallel I/O, and portable across multiple HPC platforms. Standing upon MPI-IO library allows SAIO to be compatible with MPI-IO based high level I/O libraries, such as parallel HDF5, parallel NetCDF, as well as proprietary and open source software, like Ansys Fluent, WRF Model etc. In addition, SAIO follows current MPI standard, which makes it be portable across many HPC platforms and scalable. SAIO, which is implemented as dynamic library and loaded dynamically, does not require recompiling or changing application's source codes. By simply adding several export directives into their job submission scripts, engineers and scientists will be able to run their jobs more efficiently. Furthermore, an automated SAIO training utility keeps the optimal configurations up to date, without any manuell efforts of user involved.Item Open Access Large-scale analysis of textual and multivariate data combining machine learning and visualization(2022) Knittel, Johannes; Ertl, Thomas (Prof. Dr.)