Universität Stuttgart

Permanent URI for this communityhttps://elib.uni-stuttgart.de/handle/11682/1

Browse

Search Results

Now showing 1 - 10 of 30
  • Thumbnail Image
    ItemOpen Access
    Causal models for decision making via integrative inference
    (2017) Geiger, Philipp; Toussaint, Marc (Prof. Dr.)
    Understanding causes and effects is important in many parts of life, especially when decisions have to be made. The systematic inference of causal models remains a challenge though. In this thesis, we study (1) "approximative" and "integrative" inference of causal models and (2) causal models as a basis for decision making in complex systems. By "integrative" here we mean including and combining settings and knowledge beyond the outcome of perfect randomization or pure observation for causal inference, while "approximative" means that the causal model is only constrained but not uniquely identified. As a basis for the study of topics (1) and (2), which are closely related, we first introduce causal models, discuss the meaning of causation and embed the notion of causation into a broader context of other fundamental concepts. Then we begin our main investigation with a focus on topic (1): we consider the problem of causal inference from a non-experimental multivariate time series X, that is, we integrate temporal knowledge. We take the following approach: We assume that X together with some potential hidden common cause - "confounder" - Z forms a first order vector autoregressive (VAR) process with structural transition matrix A. Then we examine under which conditions the most important parts of A are identifiable or approximately identifiable from only X, in spite of the effects of Z. Essentially, sufficient conditions are (a) non-Gaussian, independent noise or (b) no influence from X to Z. We present two estimation algorithms that are tailored towards conditions (a) and (b), respectively, and evaluate them on synthetic and real-world data. We discuss how to check the model using X. Still focusing on topic (1) but already including elements of topic (2), we consider the problem of approximate inference of the causal effect of a variable X on a variable Y in i.i.d. settings "between" randomized experiments and observational studies. Our approach is to first derive approximations (upper/lower bounds) on the causal effect, in dependence on bounds on (hidden) confounding. Then we discuss several scenarios where knowledge or beliefs can be integrated that in fact imply bounds on confounding. One example is about decision making in advertisement, where knowledge on partial compliance with guidelines can be integrated. Then, concentrating on topic (2), we study decision making problems that arise in cloud computing, a computing paradigm and business model that involves complex technical and economical systems and interactions. More specifically, we consider the following two problems: debugging and control of computing systems with the help of sandbox experiments, and prediction of the cost of "spot" resources for decision making of cloud clients. We first establish two theoretical results on approximate counterfactuals and approximate integration of causal knowledge, which we then apply to the two problems in toy scenarios.
  • Thumbnail Image
    ItemOpen Access
    Änderungstolerante Serialisierung großer Datensätze für mehrsprachige Programmanalysen
    (2017) Felden, Timm; Plödereder, Erhard (Prof. Dr.)
  • Thumbnail Image
    ItemOpen Access
    Structurally informed methods for improved sentiment analysis
    (2017) Kessler, Stefanie Wiltrud; Kuhn, Jonas (Prof. Dr.)
    Sentiment analysis deals with methods to automatically analyze opinions in natural language texts, e.g., product reviews. Such reviews contain a large number of fine-grained opinions, but to automatically extract detailed information it is necessary to handle a wide variety of verbalizations of opinions. The goal of this thesis is to develop robust structurally informed models for sentiment analysis which address challenges that arise from structurally complex verbalizations of opinions. In this thesis, we look at two examples for such verbalizations that benefit from including structural information into the analysis: negation and comparisons. Negation directly influences the polarity of sentiment expressions, e.g., while "good" is positive, "not good" expresses a negative opinion. We propose a machine learning approach that uses information from dependency parse trees to determine whether a sentiment word is in the scope of a negation expression. Comparisons like "X is better than Y" are the main topic of this thesis. We present a machine learning system for the task of detecting the individual components of comparisons: the anchor or predicate of the comparison, the entities that are compared, which aspect they are compared in, and which entity is preferred. Again, we use structural context from a dependency parse tree to improve the performance of our system. We discuss two ways of addressing the issue of limited availability of training data for our system. First, we create a manually annotated corpus of comparisons in product reviews, the largest such resource available to date. Second, we use the semi-supervised method of structural alignment to expand a small seed set of labeled sentences with similar sentences from a large set of unlabeled sentences. Finally, we work on the task of producing a ranked list of products that complements the isolated prediction of ratings and supports the user in a process of decision making. We demonstrate how we can use the information from comparisons to rank products and evaluate the result against two conceptually different external gold standard rankings.
  • Thumbnail Image
    ItemOpen Access
    Visual analytics of human mobility behavior
    (2017) Krüger, Robert; Ertl, Thomas (Prof. Dr.)
    Human mobility plays an important role in many domains of today’s society, such as security, logistics, transportation, urban planning, and geo-marketing. Both, government and industry thus have great interest in understanding mobility patterns and their driving social, economical, and environmental causes and effects. While stakeholders had to rely on manual traffic surveys for a long time, improvements in tracking technology made analyses based on large digital datasets possible. Recently, the omnipresence of mobile devices significantly increased the amounts of collected movement and context data. People are willing to reveal their position, but also further personal details such as visited places, observations, events, news, and sentiments in exchange for personalized services and social networking. This opens up new possibilities for many domains where a semantic mobility understanding is required but also raises major challenges. To reveal a holistic picture, heterogeneous datasets of different services with different resolution and format have to be fused and analyzed. However, social sensing data is vast, has varying scale, is unevenly distributed, and constantly updated. Especially content from social media services is often inconsistent, unreliable, and incomplete, which requires special treatment. Fully automatic mapping approaches are not trustworthy as they do not take into account these uncertainties. At the same time, manual approaches become insufficient with large amounts of data. Even when data is perfectly aligned, analysts cannot purely rely on existing techniques. Answering questions about reasons for movement requires a broader perspective that takes into account environmental and social context, the driving forces for human mobility behavior. Visual analytics is an emerging research field to tackle such challenges. It creates added value by combining the processing power and accuracy of machines with human capabilities to perceive information visually. Automatic means are used to fuse and aggregate data and to detect hidden patterns therein. Interactive visualizations allow to explore and query the data and to steer the automatic processes with domain knowledge. This increases trust in data, models, and results, which is especially important when critical decisions need to be made. The strengths of visual analytics have been shown to be particularly advantageous when problems and goals are underspecified and exploratory means are needed to discover yet unknown patterns. This thesis presents novel visual analytics approaches to derive meaning and reasons behind movement, by taking into account the aforementioned characteristics. The approaches are aligned in a holistic process model covering all steps from data retrieval, enrichment, exploration, and verification to externalization of gained knowledge for various fields of application such as electric mobility, event management, and law enforcement. It is shown how data from social media can not only be used to retrieve up-to-date movement information, but also to enrich movement trajectories from other sources with structured and unstructured information about places, events, transactions, and other observations. Through highly interactive visual interfaces analysts can bring in domain knowledge to deal with uncertainties during data fusion and to steer the subsequent semantic analysis. Exploratory and confirmatory analysis techniques are presented to create hypotheses, refine them, and find support in the data. Analysts can discover routines and abnormal behavior with assistance of automatic pattern detection methods to cope with the vast amounts of data. Spatial drill-down is supported by a set-based focus+context technique, while a more abstract visual query language allows to explicitly formulate, extract, and query for movement patterns. The approaches are applied in different scenarios and are integrated in a visual analytics system. Evaluation with experts and novice users, case studies, and comparisons to ground truth data reveal the need and effectiveness of the contributions. Overall, the thesis contributes a visual analytics process for human mobility behavior with novel semantic analysis approaches, ranging from global movements of many to local activities of a few people, for a wide range of application domains.
  • Thumbnail Image
    ItemOpen Access
    A light weighted semi-automatically I/O-tuning solution for engineering applications
    (Stuttgart : Höchstleistungsrechenzentrum, Universität Stuttgart, 2017) Wang, Xuan; Resch, Michael M. (Prof. Dr.-Ing. Dr. h.c. Dr. h.c. Prof. E.h.)
    Today’s engineering applications running on high performance computing (HPC) platforms generate more and more diverse data simultaneously and require large storage systems as well as extremely high data transfer rates to store their data. To achieve high performance data transfer rate (I/O performance), computer scientists together with HPC manufacturers have developed a lot of innovative solutions. However, how to transfer the knowledge of their solutions to engineers and scientists has become one of the largest barriers. Since the engineers and scientists are experts in their own professional areas, they might not be capable of tuning their applications to the optimal level. Sometimes they might even drop down the I/O performance by mistake. The basic training courses provided by computing centers like HLRS seem to be not sufficient enough to transfer the know-how required. In order to overcome this barrier, I have developed a semi-automatically I/O-tuning solution (SAIO) for engineering applications. SAIO, a light weighted and intelligent framework, is designed to be compatible with as many engineering applications as possible, scalable with large engineering applications, usable for engineers and scientists with little knowledge of parallel I/O, and portable across multiple HPC platforms. Standing upon MPI-IO library allows SAIO to be compatible with MPI-IO based high level I/O libraries, such as parallel HDF5, parallel NetCDF, as well as proprietary and open source software, like Ansys Fluent, WRF Model etc. In addition, SAIO follows current MPI standard, which makes it be portable across many HPC platforms and scalable. SAIO, which is implemented as dynamic library and loaded dynamically, does not require recompiling or changing application's source codes. By simply adding several export directives into their job submission scripts, engineers and scientists will be able to run their jobs more efficiently. Furthermore, an automated SAIO training utility keeps the optimal configurations up to date, without any manuell efforts of user involved.
  • Thumbnail Image
    ItemOpen Access
    Efficient code offloading techniques for mobile applications
    (2017) Berg, Florian; Rothermel, Kurt (Prof. Dr. rer. nat. Dr. h. c.)
    Since the release of the first smart phone from Apple in the year 2007, smart phones in general experience a fast growth of rising popularity. A smart phone typically possesses among others a touchscreen display as user interface, a mobile communication for accessing the Internet, and a System-on-a-Chip as an integrated circuit of required components like a central processing unit. This pervasive computing platform derives its required power from a battery, where an end user runs upon it different kinds of applications like a calendar application or a high-end mobile game. Differing in the usage of the local resources from a battery-operated smart phone, a heavy utilization of local resources like playing a resource-demanding application drains the limited resource of energy in few hours. Despite the constant increase of memory, communication, or processing capabilities of a smart phone since the release in 2007, applications are also getting more and more sophisticated and demanding. As a result, the energy consumed on a smart phone was, still is, and will be its main limiting factor. To prevent the limited resource of energy from a quick exhaustion, researchers propose code offloading for (resource-constrained) mobile devices like smart phones. Code offloading strives for increasing the energy efficiency and execution speed of applications by utilizing a server instance in the infrastructure. To this end, a code offloading approach executes dynamically resource-intensive parts from an application on powerful remote servers in the infrastructure on behalf of a (resource-constrained) mobile device. During the remote execution of a resource-intensive application part on a remote server, a mobile device only waits in idle mode until it receives the result of the application part executed remotely. Instead of executing an application part on its local resources, a (resource-constrained) mobile device benefits from the more powerful resources of a remote server by sending the information required for a remote execution, waiting in idle mode, and receiving the result of the remote execution. The process of offloading code from a (resource-constrained) mobile device to a powerful remote server in the infrastructure, however, faces different problems. For instance, code offloading introduces some overhead for additional computation and communication on a mobile device. Moreover, spontaneous disconnections during a remote execution can cause a higher energy consumption and execution time than a local execution on a mobile device without code offloading. To this end, this dissertation addresses the whole process of offloading code from a mobile device not only to one but also to multiple remote resources, comprising the following steps: 1) First, code offloading has to identify feasible parts from an application for a remote execution, where the distributed execution of the identified application part is more beneficial than its local execution. A feasible part for a remote execution typically has the following properties: A low size of information required for transmission before a remote execution, a resource-intensive computation not accessing local sensors, and a low size of information required for transmission after a remote execution. In the area of identification of application parts for a remote execution, this dissertation presents an approach based on code annotations from application developers that automatically transforms a monolithic execution on a mobile device to a distributed execution on multiple heterogeneous resources. In contrast to related approaches in the literature, the annotation-based approach requires least interventions from application developers and end users, keeping the overhead introduced on a mobile device low. 2) For an application part identified for a remote execution, code offloading has to determine its execution side, executing the application part either on the local resources of a mobile device or on the remote resource at the infrastructure. In the area of determining the execution side for an application part, this dissertation presents the offloading problem, where a mobile device decides whether to execute an application part locally or remotely. Furthermore, this dissertation also presents an approach called "code bubbling" that shifts the decision making into the infrastructure. In contrast to related approaches in the literature, the decision-based approach on a mobile device and the bubbling-based approach minimize the execution time, energy consumption, and monetary cost for an application. 3) To determine the execution side for an application part identified for a remote execution, code offloading has to obtain different parameters from the application, participating resources, and utilized links. In the area of obtaining the information required from an application, this dissertation presents a bit-flipping approach that dynamically flips a bit at the modification of application-related information. Furthermore, this dissertation also presents an offload-aware Application Programming Interface (API) that encapsulates the application-related information required for code offloading. In contrast to related approaches in the literature, the bit-flipping approach and the offload-aware API provide an efficient gathering of information at run-time, keeping the overhead introduced on a mobile device low. 4) Beside the information from an application, code offloading has to obtain further information from participating resources and utilized links. In the area of obtaining the information required from participating resources and utilized links, this dissertation presents the approach of code bubbling, already mentioned above. In contrast to related approaches in the literature, the bubbling-based approach makes the offload decision at the place where the related information occurs, keeping the overhead introduced on a mobile device, participating resources, and utilized links low. 5) In case of a remote execution of an application part, code offloading has to send the information required for a remote execution to the remote resource that subsequently executes the application part on behalf of the mobile device. In the area of sending the required information and executing an application part remotely, this dissertation presents code offloading with a cache on the remote side. The cache on the remote side serves as a collective storage of results for already executed application parts, avoiding a repeated execution of previously run application parts. In contrast to related approaches in the literature, the caching-aware approach increases the efficiency of code offloading, keeping the energy consumption, execution time, and monetary cost low. 6) While a remote resource executes an application part, code offloading has to handle the occurrence of failures like a failure of the remote resource or a disconnection. In the area of handling the occurrence of failures, this dissertation presents a preemptable offloading of code with safe-points. The preemptable offloading of code with safe-points enables an interruption of an offloading process and a corresponding continuation of a remote execution on a mobile device, without abandoning the complete result calculated remotely so far. Based on a preemptable offloading of code with safe-points, this dissertation further presents a predictive offloading of code with safe-points that minimizes the overhead introduced by safe-point'ing and maximizes the efficiency of a deadline-aware offloading. In contrast to related approaches in the literature, the preemptable approach with safe-point'ing increases the robustness of code offloading in case of failures. Furthermore, the predictive approach for safe-point'ing ensures a minimal responsiveness and a maximal efficiency of applications despite failures. 7) At the end of a remote execution of an application part, code offloading has to gather on the remote resource the required information after the execution and send this information to the mobile device. In the area of gathering the required information, a remote resource utilizes the same approaches as a mobile device, already mentioned above (cf. the bit-flipping approach and the offload-aware API). 8) Last, code offloading has to receive on the mobile device the information from a remote resource, install the information on the mobile device, and continue the execution of the application on the mobile device. In the area of installing the information and continuing the execution locally, a mobile device utilizes the approaches already mentioned above (cf. the bit-flipping approach and the offload-aware API).
  • Thumbnail Image
    ItemOpen Access
    Workload mix definition for benchmarking BPMN 2.0 Workflow Management Systems
    (2017) Skouradaki, Marigianna; Leymann, Frank (Prof. Dr. Dr. h. c.)
    Nowadays, enterprises broadly use Workflow Management Systems (WfMSs) to design, deploy, execute, monitor and analyse their automated business processes. Through the years, WfMSs evolved into platforms that deliver complex service oriented applications. In this regard, they need to satisfy enterprise-grade performance requirements, such as dependability and scalability. With the ever-growing number of WfMSs that are currently available in the market, companies are called to choose which product is optimal for their requirements and business models. Benchmarking is an established practice used to compare alternative products and leverages the continuous improvement of technology by setting a clear target in measuring and assessing performance. In particular, for service oriented WfMSs there is not yet a widely accepted standard benchmark available, even if workflow modelling languages such as Web Services Business Process Execution Language (WS-BPEL) and Business Process Model and Notation 2.0 (BPMN 2.0) have been adopted as the de-facto standards. A possible explanation on this deficiency can be given by the inherent architectural complexity of WfMSs and the very large number of parameters affecting their performance. However, the need for a standard benchmark for WfMSs is frequently affirmed by the literature. The goal of the BenchFlow approach is to propose a framework towards the first standard benchmark forassessing and comparing the performance of BPMN 2.0 WfMSs. To this end, the approach addresses a set of challenges spanning from logistic challenges, that are related to the collection of a representative set of usage scenarios,to technical challenges, that concern the specific characteristics of a WfMS. This work focuses on a subset of these challenges dealing with the definition of a representative set of process models and corresponding data that will be given as an input to the benchmark. This set of representative process models and corresponding data are referred to as the workload mix of the benchmark. More particularly, we first prepare the theoretical background for defining a representative workload mix. This is accomplished through identification of the basic components of a workload model for WfMS benchmarks, as well as the investigation of the impact of the BPMN 2.0 language constructs to the WfMS’s performance, by means of introducing the first BPMN 2.0 micro-benchmark. We proceed by collecting real-world process models for the identification of a representative workload mix. Therefore, the collection is analysed with respect to its statistical characteristics and also with a novel algorithm that detects and extracts the reoccurring structural patterns of the collection.The extracted reoccurring structures are then used for generating synthetic process models that reflect the essence of the original collection.The introduced methods are brought together in a tool chain that supports the workload mix generation. As a final step, we applied the proposed methods on a real-world case study, that bases on a collection of thousands of real-world process models and generates a representative workload mix to be used in a benchmark. The results show that the generated workload mix is successful in its application for stressing the WfMSs under test.
  • Thumbnail Image
    ItemOpen Access
    Model-centric task debugging at scale
    (Stuttgart : Höchstleistungsrechenzentrum, Universität Stuttgart, 2017) Nachtmann, Mathias; Resch, Michael (Prof. Dr.-Ing. Dr. h.c. Dr. h.c. Prof. E.h.)
    Chapter 1, Introduction, presents state of the art debugging techniques in high-performance computing. The lack of information out of the programming model, these traditional debugging tools suffer, motivated the model-centric debugging approach. Chapter 2, Technical Background: Parallel Programming Models & Tools, exemplifies the programming models used in the scope of my work. The differences between those models are illustrated, and for the most popular programming models in HPC, examples are attached in this chapter. The chapter also describes Temanejo, the toolchain's front-end, which supports the application developer during his actions. In the following chapter (Chapter 4), Design: Events & Requests in Ayudame, the theory of task" and dependency" representation is stated. The chapter includes the design of different information types, which are later on used for the communication between a programming model and the model-centric debugging approach. In chapter 5, Design: Communication Back-end Ayudame, the design of the back-end tool infrastructure is described in detail. This also includes the problems occurring during the design process and their specific solutions. The concept of a multi-process environment and the usage of different programming models at the same time is also part of this chapter. The following chapter (Chapter 6), Instrumentation of Runtime Systems, briefly describes the information exchange between a programming model and the model-centric debugging approach. The different ways of monitoring and controlling an application through its programming model are illustrated. In chapter 7, Case Study: Performance Debugging, the model-centric debugging approach is used for optimising an application. All necessary optimisation steps are described in detail, with the help of mock-ups. Additionally, a description of the different optimised versions is included in this chapter. The evaluation, done on different hardware architectures, is presented and discussed. This includes not only the behaviour of the versions on different platforms but also architecture specific issues.
  • Thumbnail Image
    ItemOpen Access
    Improvement of hardware reliability with aging monitors
    (2017) Liu, Chang; Wunderlich, Hans-Joachim (Prof. Dr.)
  • Thumbnail Image
    ItemOpen Access
    Issues on distributed caching of spatial data
    (2017) Lübbe, Carlos; Mitschang, Bernhard (Prof. Dr.-Ing. habil.)
    Die Menge an digitalen Informationen über Orte hat bis heute rapide zugenommen. Mit der Verbreitung mobiler, internetfähiger Geräte kann nun jederzeit und von überall auf diese Informationen zugegriffen werden. Im Zuge dieser Entwicklung wurden zahlreiche ortsbasierte Anwendungen und Dienste populär. So reihen sich digitale Einkaufsassistenten und Touristeninformationsdienste sowie geosoziale Anwendungen in der Liste der beliebtesten Vertreter. Steigende Benutzerzahlen sowie die rapide wachsenden Datenmengen, stellen ernstzunehmende Herausforderungen für die Anbieter ortsbezogener Informationen dar. So muss der Datenbereitstellungsprozess effizient gestaltet sein, um einen kosteneffizienten Betrieb zu ermöglichen. Darüber hinaus sollten Ressourcen flexibel genug zugeordnet werden können, um Lastungleichgewichte zwischen Systemkomponenten ausgleichen zu können. Außerdem müssen Datenanbieter in der Lage sein, die Verarbeitungskapazitäten mit steigender und fallender Anfragelast zu skalieren. Mit dieser Arbeit stellen wir einen verteilten Zwischenspeicher für ortsbasierte Daten vor. In dem verteilten Zwischenspeicher werden Replika der am häufigsten verwendeten Daten von mehreren unabhängigen Servern im flüchtigen Speicher vorgehalten. Mit unserem Ansatz können die Herausforderungen für Anbieter ortsbezogener Informationen wie folgt addressiert werden: Zunächst sorgt eine speziell für die Zugriffsmuster ortsbezogener Anwendungen konzipierte Zwischenspreicherungsstragie für eine Erhöhung der Gesamteffizienz, da eine erhebliche Menge der zwischengespeicherten Ergebnisse vorheriger Anfragen wiederverwendet werden kann. Darüber hinaus bewirken unsere speziell für den Geo-Kontext entwickelten Lastbalancierungsverfahren den Ausgleich dynamischer Lastungleichgewichte. Letztlich befähigen unsere verteilten Protokolle zur Hinzu- und Wegnahme von Servern die Anbieter ortsbezogener Informationen, die Verarbeitungskapazität steigender oder fallender Anfragelast anzupassen. In diesem Dokument untersuchen wir zunächst die Anforderungen der Datenbereitstellung im Kontext von ortsbasierten Anwendungen. Anschließend diskutieren wir mögliche Entwurfsmuster und leiten eine Architektur für einen verteilten Zwischenspeicher ab. Im Verlauf dieser Arbeit, entstanden mehrere konkrete Implementierungsvarianten, die wir in diesem Dokument vorstellen und miteinander vergleichen. Unsere Evaluation zeigt nicht nur die prinzipielle Machbarkeit, sondern auch die Effektivität von unserem Caching-Ansatz für die Erreichung von Skalierbarkeit und Verfügbarkeit im Kontext der Bereitstellung von ortsbasierten Daten.