05 Fakultät Informatik, Elektrotechnik und Informationstechnik

Permanent URI for this collectionhttps://elib.uni-stuttgart.de/handle/11682/6

Browse

Search Results

Now showing 1 - 10 of 13
  • Thumbnail Image
    ItemOpen Access
    Exploring classification algorithms and data feature selection for domain specific industrial text data
    (2016) Villanueva Zacarías, Alejandro Gabriel
    Unstructured text data represents a valuable source of information that nonetheless remains sub utilised due to the lack of efficient methods to manipulate it and extract insights from it. One example of such deficiencies is the lack of suitable classification solutions that address the particular nature of domain-specific industrial text data. In this thesis we explore the factors that impact the performance of classification algorithms, as well as the properties of domain-specific industrial text data, to propose a framework that guides the design of text classification solutions that can achieve an optimal trade-off between accuracy and processing time. Our research model investigates the effect that the availability of data features has on the observed performance of a classification algorithm. To explain this relationship, we build a series of prototypical Naïve Bayes algorithm configurations out of existing components and test them on two role datasets from a quality process of an automotive company. A key finding is that properly designed feature selection techniques can play a major role in achieving optimal performance both in terms of accuracy and processing time by providing the right amount of meaningful features. We test our results for statistical significance, proceed to suggest an optimal solution for our application scenario and conclude by describing the nature of the variable relationships contained in our research model.
  • Thumbnail Image
    ItemOpen Access
  • Thumbnail Image
    ItemOpen Access
    SMARTEN : a sample-based approach towards privacy-friendly data refinement
    (2022) Stach, Christoph; Behringer, Michael; Bräcker, Julia; Gritti, Clémentine; Mitschang, Bernhard
    Two factors are crucial for the effective operation of modern-day smart services: Initially, IoT-enabled technologies have to capture and combine huge amounts of data on data subjects. Then, all these data have to be processed exhaustively by means of techniques from the area of big data analytics. With regard to the latter, thorough data refinement in terms of data cleansing and data transformation is the decisive cornerstone. Studies show that data refinement reaches its full potential only by involving domain experts in the process. However, this means that these experts need full insight into the data in order to be able to identify and resolve any issues therein, e.g., by correcting or removing inaccurate, incorrect, or irrelevant data records. In particular for sensitive data (e.g., private data or confidential data), this poses a problem, since these data are thereby disclosed to third parties such as domain experts. To this end, we introduce SMARTEN, a sample-based approach towards privacy-friendly data refinement to smarten up big data analytics and smart services. SMARTEN applies a revised data refinement process that fully involves domain experts in data pre-processing but does not expose any sensitive data to them or any other third-party. To achieve this, domain experts obtain a representative sample of the entire data set that meets all privacy policies and confidentiality guidelines. Based on this sample, domain experts define data cleaning and transformation steps. Subsequently, these steps are converted into executable data refinement rules and applied to the entire data set. Domain experts can request further samples and define further rules until the data quality required for the intended use case is reached. Evaluation results confirm that our approach is effective in terms of both data quality and data privacy.
  • Thumbnail Image
    ItemOpen Access
    Solving high-dimensional dynamic portfolio choice models with hierarchical B-splines on sparse grids
    (2021) Schober, Peter; Valentin, Julian; Pflüger, Dirk
    Discrete time dynamic programming to solve dynamic portfolio choice models has three immanent issues: firstly, the curse of dimensionality prohibits more than a handful of continuous states. Secondly, in higher dimensions, even regular sparse grid discretizations need too many grid points for sufficiently accurate approximations of the value function. Thirdly, the models usually require continuous control variables, and hence gradient-based optimization with smooth approximations of the value function is necessary to obtain accurate solutions to the optimization problem. For the first time, we enable accurate and fast numerical solutions with gradient-based optimization while still allowing for spatial adaptivity using hierarchical B-splines on sparse grids. When compared to the standard linear bases on sparse grids or finite difference approximations of the gradient, our approach saves an order of magnitude in total computational complexity for a representative dynamic portfolio choice model with varying state space dimensionality, stochastic sample space, and choice variables.
  • Thumbnail Image
    ItemOpen Access
    Efficient and scalable initialization of partitioned coupled simulations with preCICE
    (2021) Totounferoush, Amin; Simonis, Frédéric; Uekermann, Benjamin; Schulte, Miriam
    preCICE is an open-source library, that provides comprehensive functionality to couple independent parallelized solver codes to establish a partitioned multi-physics multi-code simulation environment. For data communication between the respective executables at runtime, it implements a peer-to-peer concept, which renders the computational cost of the coupling per time step negligible compared to the typical run time of the coupled codes. To initialize the peer-to-peer coupling, the mesh partitions of the respective solvers need to be compared to determine the point-to-point communication channels between the processes of both codes. This initialization effort can become a limiting factor, if we either reach memory limits or if we have to re-initialize communication relations in every time step. In this contribution, we remove two remaining bottlenecks: (i) We base the neighborhood search between mesh entities of two solvers on a tree data structure to avoid quadratic complexity, and (ii) we replace the sequential gather-scatter comparison of both mesh partitions by a two-level approach that first compares bounding boxes around mesh partitions in a sequential manner, subsequently establishes pairwise communication between processes of the two solvers, and finally compares mesh partitions between connected processes in parallel. We show, that the two-level initialization method is fives times faster than the old one-level scheme on 24,567 CPU-cores using a mesh with 628,898 vertices. In addition, the two-level scheme is able to handle much larger computational meshes, since the central mesh communication of the one-level scheme is replaced with a fully point-to-point mesh communication scheme.
  • Thumbnail Image
    ItemOpen Access
    Machine learning-based lie detector applied to a novel annotated game dataset
    (2021) Rodriguez-Diaz, Nuria; Aspandi, Decky; Sukno, Federico M.; Binefa, Xavier
    Lie detection is considered a concern for everyone in their day-to-day life, given its impact on human interactions. Thus, people normally pay attention to both what their interlocutors are saying and to their visual appearance, including the face, to find any signs that indicate whether or not the person is telling the truth. While automatic lie detection may help us to understand these lying characteristics, current systems are still fairly limited, partly due to lack of adequate datasets to evaluate their performance in realistic scenarios. In this work, we collect an annotated dataset of facial images, comprising both 2D and 3D information of several participants during a card game that encourages players to lie. Using our collected dataset, we evaluate several types of machine learning-based lie detectors in terms of their generalization, in person-specific and cross-application experiments. We first extract both handcrafted and deep learning-based features as relevant visual inputs, then pass them into multiple types of classifier to predict respective lie/non-lie labels. Subsequently, we use several metrics to judge the models’ accuracy based on the models predictions and ground truth. In our experiment, we show that models based on deep learning achieve the highest accuracy, reaching up to 57% for the generalization task and 63% when applied to detect the lie to a single participant. We further highlight the limitation of the deep learning-based lie detector when dealing with cross-application lie detection tasks. Finally, this analysis along the proposed datasets would potentially be useful not only from the perspective of computational systems perspective (e.g., improving current automatic lie prediction accuracy), but also for other relevant application fields, such as health practitioners in general medical counselings, education in academic settings or finance in the banking sector, where close inspections and understandings of the actual intentions of individuals can be very important.
  • Thumbnail Image
    ItemOpen Access
    Query processing in blockchain systems : current state and future challenges
    (2021) Przytarski, Dennis; Stach, Christoph; Gritti, Clémentine; Mitschang, Bernhard
    When, in 2008, Satoshi Nakamoto envisioned the first distributed database management system that relied on cryptographically secured chain of blocks to store data in an immutable and tamper-resistant manner, his primary use case was the introduction of a digital currency. Owing to this use case, the blockchain system was geared towards efficient storage of data, whereas the processing of complex queries, such as provenance analyses of data history, is out of focus. The increasing use of Internet of Things technologies and the resulting digitization in many domains, however, have led to a plethora of novel use cases for a secure digital ledger. For instance, in the healthcare sector, blockchain systems are used for the secure storage and sharing of electronic health records, while the food industry applies such systems to enable a reliable food-chain traceability, e.g., to prove compliance with cold chains. In these application domains, however, querying the current state is not sufficient - comprehensive history queries are required instead. Due to these altered usage modes involving more complex query types, it is questionable whether today’s blockchain systems are prepared for this type of usage and whether such queries can be processed efficiently by them. In our paper, we therefore investigate novel use cases for blockchain systems and elicit their requirements towards a data store in terms of query capabilities. We reflect the state of the art in terms of query support in blockchain systems and assess whether it is capable of meeting the requirements of such more sophisticated use cases. As a result, we identify future research challenges with regard to query processing in blockchain systems.
  • Thumbnail Image
    ItemOpen Access
    Protecting sensitive data in the information age : state of the art and future prospects
    (2022) Stach, Christoph; Gritti, Clémentine; Bräcker, Julia; Behringer, Michael; Mitschang, Bernhard
    The present information age is characterized by an ever-increasing digitalization. Smart devices quantify our entire lives. These collected data provide the foundation for data-driven services called smart services. They are able to adapt to a given context and thus tailor their functionalities to the user’s needs. It is therefore not surprising that their main resource, namely data, is nowadays a valuable commodity that can also be traded. However, this trend does not only have positive sides, as the gathered data reveal a lot of information about various data subjects. To prevent uncontrolled insights into private or confidential matters, data protection laws restrict the processing of sensitive data. One key factor in this regard is user-friendly privacy mechanisms. In this paper, we therefore assess current state-of-the-art privacy mechanisms. To this end, we initially identify forms of data processing applied by smart services. We then discuss privacy mechanisms suited for these use cases. Our findings reveal that current state-of-the-art privacy mechanisms provide good protection in principle, but there is no compelling one-size-fits-all privacy approach. This leads to further questions regarding the practicality of these mechanisms, which we present in the form of seven thought-provoking propositions.
  • Thumbnail Image
    ItemOpen Access
    CLAIRE : parallelized diffeomorphic image registration for large-scale biomedical imaging applications
    (2022) Himthani, Naveen; Brunn, Malte; Kim, Jae-Youn; Schulte, Miriam; Mang, Andreas; Biros, George
    We study the performance of CLAIRE - a diffeomorphic multi-node, multi-GPU image-registration algorithm and software-in large-scale biomedical imaging applications with billions of voxels. At such resolutions, most existing software packages for diffeomorphic image registration are prohibitively expensive. As a result, practitioners first significantly downsample the original images and then register them using existing tools. Our main contribution is an extensive analysis of the impact of downsampling on registration performance. We study this impact by comparing full-resolution registrations obtained with CLAIRE to lower resolution registrations for synthetic and real-world imaging datasets. Our results suggest that registration at full resolution can yield a superior registration quality-but not always. For example, downsampling a synthetic image from 10243 to 2563 decreases the Dice coefficient from 92% to 79%. However, the differences are less pronounced for noisy or low contrast high resolution images. CLAIRE allows us not only to register images of clinically relevant size in a few seconds but also to register images at unprecedented resolution in reasonable time. The highest resolution considered are CLARITY images of size 2816×3016×1162. To the best of our knowledge, this is the first study on image registration quality at such resolutions.
  • Thumbnail Image
    ItemOpen Access
    Implicit consensus clustering from multiple graphs
    (2021) Boutalbi, Rafika; Labiod, Lazhar; Nadif, Mohamed
    Dealing with relational learning generally relies on tools modeling relational data. An undirected graph can represent these data with vertices depicting entities and edges describing the relationships between the entities. These relationships can be well represented by multiple undirected graphs over the same set of vertices with edges arising from different graphs catching heterogeneous relations. The vertices of those networks are often structured in unknown clusters with varying properties of connectivity. These multiple graphs can be structured as a three-way tensor, where each slice of tensor depicts a graph which is represented by a count data matrix. To extract relevant clusters, we propose an appropriate model-based co-clustering capable of dealing with multiple graphs. The proposed model can be seen as a suitable tensor extension of mixture models of graphs, while the obtained co-clustering can be treated as a consensus clustering of nodes from multiple graphs. Applications on real datasets and comparisons with multi-view clustering and tensor decomposition methods show the interest of our contribution.