05 Fakultät Informatik, Elektrotechnik und Informationstechnik

Permanent URI for this collectionhttps://elib.uni-stuttgart.de/handle/11682/6

Browse

Search Results

Now showing 1 - 10 of 15
  • Thumbnail Image
    ItemOpen Access
  • Thumbnail Image
    ItemOpen Access
    SMARTEN : a sample-based approach towards privacy-friendly data refinement
    (2022) Stach, Christoph; Behringer, Michael; Bräcker, Julia; Gritti, Clémentine; Mitschang, Bernhard
    Two factors are crucial for the effective operation of modern-day smart services: Initially, IoT-enabled technologies have to capture and combine huge amounts of data on data subjects. Then, all these data have to be processed exhaustively by means of techniques from the area of big data analytics. With regard to the latter, thorough data refinement in terms of data cleansing and data transformation is the decisive cornerstone. Studies show that data refinement reaches its full potential only by involving domain experts in the process. However, this means that these experts need full insight into the data in order to be able to identify and resolve any issues therein, e.g., by correcting or removing inaccurate, incorrect, or irrelevant data records. In particular for sensitive data (e.g., private data or confidential data), this poses a problem, since these data are thereby disclosed to third parties such as domain experts. To this end, we introduce SMARTEN, a sample-based approach towards privacy-friendly data refinement to smarten up big data analytics and smart services. SMARTEN applies a revised data refinement process that fully involves domain experts in data pre-processing but does not expose any sensitive data to them or any other third-party. To achieve this, domain experts obtain a representative sample of the entire data set that meets all privacy policies and confidentiality guidelines. Based on this sample, domain experts define data cleaning and transformation steps. Subsequently, these steps are converted into executable data refinement rules and applied to the entire data set. Domain experts can request further samples and define further rules until the data quality required for the intended use case is reached. Evaluation results confirm that our approach is effective in terms of both data quality and data privacy.
  • Thumbnail Image
    ItemOpen Access
    Solving high-dimensional dynamic portfolio choice models with hierarchical B-splines on sparse grids
    (2021) Schober, Peter; Valentin, Julian; Pflüger, Dirk
    Discrete time dynamic programming to solve dynamic portfolio choice models has three immanent issues: firstly, the curse of dimensionality prohibits more than a handful of continuous states. Secondly, in higher dimensions, even regular sparse grid discretizations need too many grid points for sufficiently accurate approximations of the value function. Thirdly, the models usually require continuous control variables, and hence gradient-based optimization with smooth approximations of the value function is necessary to obtain accurate solutions to the optimization problem. For the first time, we enable accurate and fast numerical solutions with gradient-based optimization while still allowing for spatial adaptivity using hierarchical B-splines on sparse grids. When compared to the standard linear bases on sparse grids or finite difference approximations of the gradient, our approach saves an order of magnitude in total computational complexity for a representative dynamic portfolio choice model with varying state space dimensionality, stochastic sample space, and choice variables.
  • Thumbnail Image
    ItemOpen Access
    Availability analysis of redundant and replicated cloud services with Bayesian networks
    (2023) Bibartiu, Otto; Dürr, Frank; Rothermel, Kurt; Ottenwälder, Beate; Grau, Andreas
    Due to the growing complexity of modern data centers, failures are not uncommon any more. Therefore, fault tolerance mechanisms play a vital role in fulfilling the availability requirements. Multiple availability models have been proposed to assess compute systems, among which Bayesian network models have gained popularity in industry and research due to its powerful modeling formalism. In particular, this work focuses on assessing the availability of redundant and replicated cloud computing services with Bayesian networks. So far, research on availability has only focused on modeling either infrastructure or communication failures in Bayesian networks, but have not considered both simultaneously. This work addresses practical modeling challenges of assessing the availability of large‐scale redundant and replicated services with Bayesian networks, including cascading and common‐cause failures from the surrounding infrastructure and communication network. In order to ease the modeling task, this paper introduces a high‐level modeling formalism to build such a Bayesian network automatically. Performance evaluations demonstrate the feasibility of the presented Bayesian network approach to assess the availability of large‐scale redundant and replicated services. This model is not only applicable in the domain of cloud computing it can also be applied for general cases of local and geo‐distributed systems.
  • Thumbnail Image
    ItemOpen Access
    Effective or predatory funding? : evaluating the hidden costs of grant applications
    (2022) Dresler, Martin; Buddeberg, Eva; Endesfelder, Ulrike; Haaker, Jan; Hof, Christian; Kretschmer, Robert; Pflüger, Dirk; Schmidt, Fabian
    Researchers are spending an increasing fraction of their time on applying for funding; however, the current funding system has considerable deficiencies in reliably evaluating the merit of research proposals, despite extensive efforts on the sides of applicants, grant reviewers and decision committees. For some funding schemes, the systemic costs of the application process as a whole can even outweigh the granted resources - a phenomenon that could be considered as predatory funding. We present five recommendations to remedy this unsatisfactory situation.
  • Thumbnail Image
    ItemOpen Access
    Analyzing the influence of hyper-parameters and regularizers of topic modeling in terms of Renyi entropy
    (2020) Koltcov, Sergei; Ignatenko, Vera; Boukhers, Zeyd; Staab, Steffen
    Topic modeling is a popular technique for clustering large collections of text documents. A variety of different types of regularization is implemented in topic modeling. In this paper, we propose a novel approach for analyzing the influence of different regularization types on results of topic modeling. Based on Renyi entropy, this approach is inspired by the concepts from statistical physics, where an inferred topical structure of a collection can be considered an information statistical system residing in a non-equilibrium state. By testing our approach on four models-Probabilistic Latent Semantic Analysis (pLSA), Additive Regularization of Topic Models (BigARTM), Latent Dirichlet Allocation (LDA) with Gibbs sampling, LDA with variational inference (VLDA)-we, first of all, show that the minimum of Renyi entropy coincides with the “true” number of topics, as determined in two labelled collections. Simultaneously, we find that Hierarchical Dirichlet Process (HDP) model as a well-known approach for topic number optimization fails to detect such optimum. Next, we demonstrate that large values of the regularization coefficient in BigARTM significantly shift the minimum of entropy from the topic number optimum, which effect is not observed for hyper-parameters in LDA with Gibbs sampling. We conclude that regularization may introduce unpredictable distortions into topic models that need further research.
  • Thumbnail Image
    ItemOpen Access
    Efficient and scalable initialization of partitioned coupled simulations with preCICE
    (2021) Totounferoush, Amin; Simonis, Frédéric; Uekermann, Benjamin; Schulte, Miriam
    preCICE is an open-source library, that provides comprehensive functionality to couple independent parallelized solver codes to establish a partitioned multi-physics multi-code simulation environment. For data communication between the respective executables at runtime, it implements a peer-to-peer concept, which renders the computational cost of the coupling per time step negligible compared to the typical run time of the coupled codes. To initialize the peer-to-peer coupling, the mesh partitions of the respective solvers need to be compared to determine the point-to-point communication channels between the processes of both codes. This initialization effort can become a limiting factor, if we either reach memory limits or if we have to re-initialize communication relations in every time step. In this contribution, we remove two remaining bottlenecks: (i) We base the neighborhood search between mesh entities of two solvers on a tree data structure to avoid quadratic complexity, and (ii) we replace the sequential gather-scatter comparison of both mesh partitions by a two-level approach that first compares bounding boxes around mesh partitions in a sequential manner, subsequently establishes pairwise communication between processes of the two solvers, and finally compares mesh partitions between connected processes in parallel. We show, that the two-level initialization method is fives times faster than the old one-level scheme on 24,567 CPU-cores using a mesh with 628,898 vertices. In addition, the two-level scheme is able to handle much larger computational meshes, since the central mesh communication of the one-level scheme is replaced with a fully point-to-point mesh communication scheme.
  • Thumbnail Image
    ItemOpen Access
    Machine learning-based lie detector applied to a novel annotated game dataset
    (2021) Rodriguez-Diaz, Nuria; Aspandi, Decky; Sukno, Federico M.; Binefa, Xavier
    Lie detection is considered a concern for everyone in their day-to-day life, given its impact on human interactions. Thus, people normally pay attention to both what their interlocutors are saying and to their visual appearance, including the face, to find any signs that indicate whether or not the person is telling the truth. While automatic lie detection may help us to understand these lying characteristics, current systems are still fairly limited, partly due to lack of adequate datasets to evaluate their performance in realistic scenarios. In this work, we collect an annotated dataset of facial images, comprising both 2D and 3D information of several participants during a card game that encourages players to lie. Using our collected dataset, we evaluate several types of machine learning-based lie detectors in terms of their generalization, in person-specific and cross-application experiments. We first extract both handcrafted and deep learning-based features as relevant visual inputs, then pass them into multiple types of classifier to predict respective lie/non-lie labels. Subsequently, we use several metrics to judge the models’ accuracy based on the models predictions and ground truth. In our experiment, we show that models based on deep learning achieve the highest accuracy, reaching up to 57% for the generalization task and 63% when applied to detect the lie to a single participant. We further highlight the limitation of the deep learning-based lie detector when dealing with cross-application lie detection tasks. Finally, this analysis along the proposed datasets would potentially be useful not only from the perspective of computational systems perspective (e.g., improving current automatic lie prediction accuracy), but also for other relevant application fields, such as health practitioners in general medical counselings, education in academic settings or finance in the banking sector, where close inspections and understandings of the actual intentions of individuals can be very important.
  • Thumbnail Image
    ItemOpen Access
    Query processing in blockchain systems : current state and future challenges
    (2021) Przytarski, Dennis; Stach, Christoph; Gritti, Clémentine; Mitschang, Bernhard
    When, in 2008, Satoshi Nakamoto envisioned the first distributed database management system that relied on cryptographically secured chain of blocks to store data in an immutable and tamper-resistant manner, his primary use case was the introduction of a digital currency. Owing to this use case, the blockchain system was geared towards efficient storage of data, whereas the processing of complex queries, such as provenance analyses of data history, is out of focus. The increasing use of Internet of Things technologies and the resulting digitization in many domains, however, have led to a plethora of novel use cases for a secure digital ledger. For instance, in the healthcare sector, blockchain systems are used for the secure storage and sharing of electronic health records, while the food industry applies such systems to enable a reliable food-chain traceability, e.g., to prove compliance with cold chains. In these application domains, however, querying the current state is not sufficient - comprehensive history queries are required instead. Due to these altered usage modes involving more complex query types, it is questionable whether today’s blockchain systems are prepared for this type of usage and whether such queries can be processed efficiently by them. In our paper, we therefore investigate novel use cases for blockchain systems and elicit their requirements towards a data store in terms of query capabilities. We reflect the state of the art in terms of query support in blockchain systems and assess whether it is capable of meeting the requirements of such more sophisticated use cases. As a result, we identify future research challenges with regard to query processing in blockchain systems.
  • Thumbnail Image
    ItemOpen Access
    Protecting sensitive data in the information age : state of the art and future prospects
    (2022) Stach, Christoph; Gritti, Clémentine; Bräcker, Julia; Behringer, Michael; Mitschang, Bernhard
    The present information age is characterized by an ever-increasing digitalization. Smart devices quantify our entire lives. These collected data provide the foundation for data-driven services called smart services. They are able to adapt to a given context and thus tailor their functionalities to the user’s needs. It is therefore not surprising that their main resource, namely data, is nowadays a valuable commodity that can also be traded. However, this trend does not only have positive sides, as the gathered data reveal a lot of information about various data subjects. To prevent uncontrolled insights into private or confidential matters, data protection laws restrict the processing of sensitive data. One key factor in this regard is user-friendly privacy mechanisms. In this paper, we therefore assess current state-of-the-art privacy mechanisms. To this end, we initially identify forms of data processing applied by smart services. We then discuss privacy mechanisms suited for these use cases. Our findings reveal that current state-of-the-art privacy mechanisms provide good protection in principle, but there is no compelling one-size-fits-all privacy approach. This leads to further questions regarding the practicality of these mechanisms, which we present in the form of seven thought-provoking propositions.