Universität Stuttgart
Permanent URI for this communityhttps://elib.uni-stuttgart.de/handle/11682/1
Browse
Search Results
Item Open Access Erklärung fehlender Ergebnisse bei der Verarbeitung hierarchischer Daten in Spark(2016) Mayer, KarstenEs existieren einige Algorithmen, die Entwicklern bei der Fehlersuche bei einer Datenbankanfrage helfen. Diese Arbeiten beantworten, wieso bestimmte Daten nicht in der Ergebnismenge für eine Anfrage vorhanden sind oder bestimmte nicht erwartete Daten in der Ergebnismenge erscheinen (Why-not-Frage). Für Anfragesprachen, die hierarchische Daten unterstützen, bestehen bisher aber nur wenige Arbeiten. In dieser Arbeit wird untersucht, welche Besonderheiten es für Why-not-Fragen bei hierarchischen Daten gibt. Dazu wird betrachtet, welche besonderen Fragestellungen dafür möglich sind und wie diese geeignet beantwortet werden können. Dabei wird auch ein konkreter Algorithmus für Python entworfen und implementiert. Anhand von diesem kann mit Hilfe eines Beispiels untersucht werden, ob der Algorithmus effizient und effektiv genug ist Why-not-Fragen zu beantworten.Item Open Access In-network packet priority adaptation for networked control systems(2016) Zinkler, StephanSharing the network between Networked Control System (NCS) having strict demands with respect to latency and jitter and applications only requiring best-effort service leads to multiple problems. An important task to consider is how to prioritize individual types of traffic in such a way that the necessary guarantees for an NCS to be stable can still be given. While there are ways to prioritize the more important control traffic of an NCS over best-effort traffic sharing the same network, a more sophisticated approach has to be found in order to handle multiple NCS sharing the highest priority. In this thesis, in-network priority scheduling applications with a global view on the network are developed in order to schedule and prioritize individual NCS such that their stability can be guaranteed while sharing the network between multiple NCS. This thesis deals with in-network packet priority scheduling for Networked Control Systems. Using Data Plane Development Kit (DPDK) to achieve a Network Function Virtualization (NFV) based approach, a priority scheduling application is implemented in a middlebox to handle continuous priorities. This application could be instantiated and migrated within the network while simultaneously using Software Defined Networking (SDN) to route the traffic to the respective nodes. Additionally, this approach is extended using SDN and OpenFlow to adapt priorities in-network. Using the eight internal perport queues of a switch, discrete priorities are used to schedule, and additionally adapt, the priorities on the switch. This approach could give the opportunity for priority-based routing by using the SDN-controller for routing decisions and configuring the switches. The evaluation of this thesis is done by simulating NCSs and emulating the network containing the middlebox. For this, a simulation of an inverted pendulum is implemented for which the use of DPDK is compared to standard sockets. It can be shown that DPDK is able to perform better due to less delay and jitter. The scheduling application is evaluated by comparing it to a round-robin scheduling approach. The result suggests that the application is able to keep multiple NCS more stable than it’s round-robin counterpart. Furthermore, it is able to stabilize a more unstable system faster and more effectively. While the maximum sampling time for a system with a pendulum having an initial angle of 35° was found to be 50ms for the round-robin scheme, the middlebox is able to keep the system stable until 120ms. The application using OpenFlow is evaluated with respect to the time it takes to configure the switch as well as the overhead imposed by the configuration compared to the number of NCS within the network.Item Open Access Exploring classification algorithms and data feature selection for domain specific industrial text data(2016) Villanueva Zacarías, Alejandro GabrielUnstructured text data represents a valuable source of information that nonetheless remains sub utilised due to the lack of efficient methods to manipulate it and extract insights from it. One example of such deficiencies is the lack of suitable classification solutions that address the particular nature of domain-specific industrial text data. In this thesis we explore the factors that impact the performance of classification algorithms, as well as the properties of domain-specific industrial text data, to propose a framework that guides the design of text classification solutions that can achieve an optimal trade-off between accuracy and processing time. Our research model investigates the effect that the availability of data features has on the observed performance of a classification algorithm. To explain this relationship, we build a series of prototypical Naïve Bayes algorithm configurations out of existing components and test them on two role datasets from a quality process of an automotive company. A key finding is that properly designed feature selection techniques can play a major role in achieving optimal performance both in terms of accuracy and processing time by providing the right amount of meaningful features. We test our results for statistical significance, proceed to suggest an optimal solution for our application scenario and conclude by describing the nature of the variable relationships contained in our research model.Item Open Access Caching concept for mobile engineering apps(2016) Steffl, MichaelMobile apps in the engineering domain, have to deal with data coming from Product Data Management (PDM)-Systems. This data contains details about the products that are very large. Geometry data like 2D or 3D Computer Aided Design (CAD) representations are included. To get the data, the apps use wireless mediums like WiFi or mobile data networks (LTE, 3G, etc.). Transferring large size data over these mediums take a lot of time and can be aborted through intermittent connectivity. Also the energy consumption increases through the long-lasting transfers. In this master thesis a concept is created that overcomes these problems. A cache on the client is used that stores the relevant data for a fast access. As the disk space on mobile devices is limited, the data that is cached has to be chosen well. Only the data that is currently needed should be stored in the cache and provided to the app. To reduce the waiting times these data should be there before it is explicitly requested. To make this possible the concept of this thesis provides preemptive caching (hoarding). Thereby, the data is cached that will probably be needed next. To decide what data is needed, context is used. The information coming from the environment of the client is used, to derive situations. With the help of these situations the data is determined that gets cached. Besides this context-aware strategy, a traditional way of caching where all requested data gets cached is used in the concept. Furthermore, this thesis addresses the caching mechanism in its entirety. It determines a policy for the replacement of not needed data to free space. Also a strategy for invalidating obsolete data in the cache is determined. Finally, a prototypical implementation of the concept within an existing mobile engineering app is presented. With the help of this prototype the concept is evaluated.Item Open Access Automatic splitting in data-parallel complex event processing systems(2016) Sanwald, TimParallel Complex Event Processing (CEP) systems handle today’s heavy loaded event streams from smart homes, network traffic systems or stock trading systems by distributing the incoming event stream to several pattern detection systems. The correct splitting is currently done by CEP experts which ensure the consistent splitting without generating false-positive or false-negative complex events in comparison with centralized CEP systems. In this work an approach is developed which automatically generates a splitting model from the pattern definition which ensures the consistent distribution without generating false positives or false negatives. This approach enables a parallel CEP system to be configured and used the same way as a centralized CEP system. Further, a method which combines window based splitting and key based splitting is presented to reduce the network load and the CPU load on pattern detection operators. The functionality of the automatic splitting and the optimization is validated with common CEP scenarios based on generated and real world data to ensure a wide applicability of the approach.Item Open Access Data parallelization in complex event processing without a dedicated splitter(2016) Lu, QingWith the popularity of Internet of Things(IoT), Complex Event Processing (or CEP) shows its power in detecting specified patterns from input event stream. There are existing parallel CEP architectures to improve the capacity of CEP system. The major data parallel CEP architecture is the Split-Process-Merge architecture, which is able to provide unbounded parallelism degree. However, it has limitation when the splitting decision becomes computational heavy, which leads the splitter becoming a bottleneck. E.g. splitting decision depends on comparing two images to check if they contain the same object such as a person. The result is that the single splitter, instead of operator instances, is doing the computational expensive job. To help analyze the cause of "heavy" splitting decision, this thesis proposes an Extended SNOOP query language, which combines features from both SNOOP and TESLA, two of the leading event specification languages. Then this thesis derives an architecture, which avoids the splitting decision, from Split-Process-Merge architecture. The Split-Process-Merge architecture splits the input event stream into sub-streams and each operator instance handles one or more sub-streams. Instead, the new architecture creates Tasks by combining every incoming event to all existing Partial Matches, and operator instances process the Tasks. The Task Creation Algorithm is content independent. It won’t check the content, like the image data, in events. Therefore, the computational heavy splitting decision is avoided. Together with this thesis, an example implementation of new architecture for a specific query is given. The Evaluation results of implementation show the new architecture obtains a good scalability as number of CPU cores increasing and as the cost of operation increasing.Item Open Access Cost optimization for data placement strategies in an analytical cloud service(2016) Saleem, Muhammad UsmanAnalyzing a large amount of business-relevant data in near-realtime in order to assist decision making became a crucial requirement for many businesses in the last years. Therefore, all major database system vendors offer solutions that assist customers in this requirement with systems that are specially tuned for accelerating analytical workloads. Before the decision is made to buy such a huge and expensive solution, customers are interested in getting a detailed workload analysis in order to estimate potential benefits. Therefore, a more agile solution is desirable having lower barriers to entry that allows customers to assess analytical solutions for their workloads and lets data scientists experiment with available data on test systems before rolling out valuable analytical reports on a production system. In such a scenario where separate systems are deployed for handling transactional workloads of daily customers business and conducting business analytics on either a cloud service or a dedicated accelerator appliance, data management and placement strategies are of high importance. Multiple approaches exist for keeping the data set in-sync and guaranteeing data coherence with unique characteristics regarding important metrics that impact query performance, such as the latency when data will be propagated, achievable throughputs for larger data volumes, or the amount of required CPU to detect and deploy data changes. So the important heuristics are analyzed and evolved in order to develop a general model for data placement and maintenance strategies. Based on this theoretical model, a prototype is also implemented that predicts these metrics.Item Open Access Adding value to object storage: integrating analytics with cloud storage back ends(2016) Noori, HodaWith the vast interest of customers in using the cloud infrastructure, cloud providers are going beyond limits to offer advanced functionalities. They try their utmost best to present the services in a way that makes the customers highly attracted and convince them about value and benefits of using such services. For this purpose, cloud providers need to have an access to customers’ data, hence customer-sensitive data stored in repositories should be transferred to the cloud. Object storages are one of the possible solutions for the implementation of repositories in cloud environments. However, due to the data being confidential and fragile, security and encryption mechanisms are required. The application of Enterprise Content Management (ECM) system highly relies on metadata, thus there is a need to keep metadata unencrypted while encrypting data itself. Therefore, cloud providers that are hosting ECM systems are forced to keep metadata unencrypted in order to satisfy the main functionalities of ECM systems on the cloud. Although other cloud providers can offer data encryption and unencrypted metadata as an option to their customers. This leads to the conclusion that enhancing object storages with analysis capabilities in ECM systems is more beneficial if it is done on top of unencrypted metadata. In this thesis I investigate how value can be added to such cloud storage services by only using access the metadata. I specifically focus on providing analytics functionality on metadata. This Master’s thesis aims at providing the means to efficiently analyze the metadata inside a cloud-based ECM system (OSECM) which uses Swift Object Store as its back end repository. I extended the OSECM system with required components by providing new modules that enable the retrieval of metadata from the object storage and the insertion of this metadata into a metadata warehouse. The importance of metadata replication in a distinct data warehouse offers the possibility of benefiting from SQL query capabilities for analysis purposes. Furthermore, an existing tool was integrated as the analysis component to offer the means for interaction with the underlying metadata warehouse and the user interface. Finally, after applying analysis queries, the results are presented on the user interface using the predefined set of visualization interfaces. The supported data structure for the visualization of the result are also defined in this work.Item Open Access Kooperative Vorhersage der minimalen Anwendungsausführungszeit(2016) Kuhn, JulianCode Offloading Frameworks verbessern durch Auslagern von Programmteilen - auch Offloadingkandidaten genannt - auf Server die Leistung oder den Energieverbrauch von Geräten mit limitierten Ressourcen. Offloadingkandidaten werden dann ausgelagert, wenn mit Inbetrachtnahme der Übertragung des Kandidaten eine Einsparung im Vergleich zur rein lokalen Ausführung vorliegt. Die Entscheidung, ob Offloading stattfindet, hängt stark von der Ausführungszeit des Kandidaten ab. Im Fall von Methoden kann die Ausführungszeit je nach aktueller Parameterkonfiguration stark variieren. Da es in vielen Fällen unpraktikabel ist, für jede Parameterkombination Aufzeichnungen durchzuführen, ist die Verwendung von einfachen, historienbasierten Modellen zur Bestimmung der Ausführungszeit ungeeignet. Eine möglichst genaue Angabe der Ausführungszeit wird aber benötigt, um die Offloadingentscheidung korrekt zu treffen. Ziel der Arbeit war, die Vorhersage von Ausführungszeiten mit Hilfe von Machine Learning Modellen anhand verschiedener Testanwendungen- und Szenarien im Kontext des Code Off-loadings zu untersuchen. Außerdem wurde ein kooperativer Systementwurf vorgestellt und implementiert, der zur Verwaltung von Datensätzen, Vorhersagemodellen und deren Erstellung, sowie zur Vorhersage von Ausführungszeiten verwendet werden kann. Der Entwurf erweitert dabei bestehende Offloadingframeworks. Es konnte festgestellt werden, dass sich Machine Learning Algorithmen zur Vorhersage und insbesondere zum Verbessern der Offloadingentscheidung eignen.Item Open Access Evaluation and analysis of realizing broker-based content routing protocols in SDN(2016) Hegazy, LobnaPublish/subscribe provides a valuable communication model to the future Internet due to the decoupling of end-users from each other. One of the stubborn challenges that face recent content-based publish/subscribe systems is the trade-off between the usage of the network bandwidth and the end-to-end delay of published events. This trade-off is imposed by the fact that most implementations depend on software brokers to filter incoming messages towards received requests from subscribers. Although this approach for filtering may present the most bandwidth efficient solutions, the use of brokers adds to the network end-to-end delay. The installed brokers are implemented at the application layer and hence the original path between publishers and subscribers is extended which adds to the delay in which messages are forwarded from publishers to subscribers. Along with the delay imposed by the extended path, another processing delay is added to the system based on the time needed for filtering incoming messages at the brokers. As the time factor is crucial to the real-world applications that depend on the content-based publish/subscribe paradigm, recent implementations try to tackle this problem by exploiting the deployed hardware in the underlying infrastructure for filtering operations. In-network filtering is enabled with the help of Software Defined Networking (SDN) technology as it allows the installment of content filters directly to the network switches/routers. Even though this approach significantly reduces the end-to-end delay, it suffers when the bandwidth efficiency is evaluated. Caused by the inherited hardware limitations, installing content filters on hardware network elements limits their expressiveness. This increases the number of published messages from publishers to subscribers on different network links which requires more bandwidth. As an intermediate solution between the two filtering approaches, the work of this thesis is the realization of a hybrid content-based publish/subscribe middleware that allows filtering operations in both network and application layers.
- «
- 1 (current)
- 2
- 3
- »