05 Fakultät Informatik, Elektrotechnik und Informationstechnik

Permanent URI for this collectionhttps://elib.uni-stuttgart.de/handle/11682/6

Browse

Search Results

Now showing 1 - 10 of 19
  • Thumbnail Image
    ItemOpen Access
    Effective or predatory funding? : evaluating the hidden costs of grant applications
    (2022) Dresler, Martin; Buddeberg, Eva; Endesfelder, Ulrike; Haaker, Jan; Hof, Christian; Kretschmer, Robert; Pflüger, Dirk; Schmidt, Fabian
    Researchers are spending an increasing fraction of their time on applying for funding; however, the current funding system has considerable deficiencies in reliably evaluating the merit of research proposals, despite extensive efforts on the sides of applicants, grant reviewers and decision committees. For some funding schemes, the systemic costs of the application process as a whole can even outweigh the granted resources - a phenomenon that could be considered as predatory funding. We present five recommendations to remedy this unsatisfactory situation.
  • Thumbnail Image
    ItemOpen Access
    Metrics and algorithms for locally fair and accurate classifications using ensembles
    (2022) Lässig, Nico; Oppold, Sarah; Herschel, Melanie
    To obtain accurate predictions of classifiers, model ensembles comprising multiple trained machine learning models are nowadays used. In particular, dynamic model ensembles pick the most accurate model for each query object, by applying the model that performed best on similar data. Dynamic model ensembles may however suffer, similarly to single machine learning models, from bias, which can eventually lead to unfair treatment of certain groups of a general population. To mitigate unfair classification, recent work has thus proposed fair model ensembles , that instead of focusing (solely) on accuracy also optimize global fairness . While such global fairness globally minimizes bias, imbalances may persist in different regions of the data, e.g., caused by some local bias maxima leading to local unfairness . Therefore, we extend our previous work by including a framework that bridges the gap between dynamic model ensembles and fair model ensembles. More precisely, we investigate the problem of devising locally fair and accurate dynamic model ensembles, which ultimately optimize for equal opportunity of similar subjects. We propose a general framework to perform this task and present several algorithms implementing the framework components. In this paper we also present a runtime-efficient framework adaptation that keeps the quality of the results on a similar level. Furthermore, new fairness metrics are presented as well as detailed informations about necessary data preparations. Our evaluation of the framework implementations and metrics shows that our approach outperforms the state-of-the art for different types and degrees of bias present in training data in terms of both local and global fairness, while reaching comparable accuracy.
  • Thumbnail Image
    ItemOpen Access
    Exploiting domain knowledge to address class imbalance and a heterogeneous feature space in multi-class classification
    (2023) Hirsch, Vitali; Reimann, Peter; Treder-Tschechlov, Dennis; Schwarz, Holger; Mitschang, Bernhard
    Real-world data of multi-class classification tasks often show complex data characteristics that lead to a reduced classification performance. Major analytical challenges are a high degree of multi-class imbalance within data and a heterogeneous feature space, which increases the number and complexity of class patterns. Existing solutions to classification or data pre-processing only address one of these two challenges in isolation. We propose a novel classification approach that explicitly addresses both challenges of multi-class imbalance and heterogeneous feature space together. As main contribution, this approach exploits domain knowledge in terms of a taxonomy to systematically prepare the training data. Based on an experimental evaluation on both real-world data and several synthetically generated data sets, we show that our approach outperforms any other classification technique in terms of accuracy. Furthermore, it entails considerable practical benefits in real-world use cases, e.g., it reduces rework required in the area of product quality control.
  • Thumbnail Image
    ItemOpen Access
    Research data management in simulation science : infrastructure, tools, and applications
    (2024) Flemisch, Bernd; Hermann, Sibylle; Herschel, Melanie; Pflüger, Dirk; Pleiss, Jürgen; Range, Jan; Roy, Sarbani; Takamoto, Makoto; Uekermann, Benjamin
    Research Data Management (RDM) has gained significant traction in recent years, being essential to allowing research data to be, e.g., findable, accessible, interoperable, and reproducible (FAIR), thereby fostering collaboration or accelerating scientific findings. We present solutions for RDM developed within the DFG-Funded Cluster of Excellence EXC2075 Data-Integrated Simulation Science (SimTech). After an introduction to the scientific context and challenges faced by simulation scientists, we outline the general data management infrastructure and present tools that address these challenges. Exemplary domain applications demonstrate the use and benefits of the proposed data management software solutions. These are complemented by additional measures for enablement and dissemination to foster the adoption of these techniques.
  • Thumbnail Image
    ItemOpen Access
    Enhancing quasi-newton acceleration for fluid-structure interaction
    (2022) Davis, Kyle; Schulte, Miriam; Uekermann, Benjamin
    We propose two enhancements of quasi-Newton methods used to accelerate coupling iterations for partitioned fluid-structure interaction. Quasi-Newton methods have been established as flexible, yet robust, efficient and accurate coupling methods of multi-physics simulations in general. The coupling library preCICE provides several variants, the so-called IQN-ILS method being the most commonly used. It uses input and output differences of the coupled solvers collected in previous iterations and time steps to approximate Newton iterations. To make quasi-Newton methods both applicable for parallel coupling (where these differences contain data from different physical fields) and to provide a robust approach for re-using information, a combination of information filtering and scaling for the different physical fields is typically required. This leads to good convergence, but increases the cost per iteration. We propose two new approaches - pre-scaling weight monitoring and a new, so-called QR3 filter, to substantially improve runtime while not affecting convergence quality. We evaluate these for a variety of fluid-structure interaction examples. Results show that we achieve drastic speedups for the pure quasi-Newton update steps. In the future, we intend to apply the methods also to volume-coupled scenarios, where these gains can be decisive for the feasibility of the coupling approach.
  • Thumbnail Image
    ItemOpen Access
    The lakehouse : state of the art on concepts and technologies
    (2024) Schneider, Jan; Gröger, Christoph; Lutsch, Arnold; Schwarz, Holger; Mitschang, Bernhard
    In the context of data analytics, so-called lakehouses refer to novel variants of data platforms that attempt to combine characteristics of data warehouses and data lakes. In this way, lakehouses promise to simplify enterprise analytics architectures, which often suffer from high operational costs, slow analytical processes and further shortcomings resulting from data replication. However, different views and notions on the lakehouse paradigm exist, which are commonly driven by individual technologies and varying analytical use cases. Therefore, it remains unclear what challenges lakehouses address, how they can be characterized and which technologies can be leveraged to implement them. This paper addresses these issues by providing an extensive overview of concepts and technologies that are related to the lakehouse paradigm and by outlining lakehouses as a distinct architectural approach for data platforms. Concepts and technologies from literature with regard to lakehouses are discussed, based on which a conceptual foundation for lakehouses is established. In addition, several popular technologies are evaluated regarding their suitability for the building of lakehouses. All findings are supported and demonstrated with the help of a representative analytics scenario. Typical challenges of conventional data platforms are identified, a new, sharper definition for lakehouses is proposed and technical requirements for lakehouses are derived. As part of an evaluation, these requirements are applied to several popular technologies, of which frameworks for data lakes turn out to be particularly helpful for the construction of lakehouses. Our work provides an overview of the state of the art and a conceptual foundation for the lakehouse paradigm, which can support future research.
  • Thumbnail Image
    ItemOpen Access
  • Thumbnail Image
    ItemOpen Access
    SMARTEN : a sample-based approach towards privacy-friendly data refinement
    (2022) Stach, Christoph; Behringer, Michael; Bräcker, Julia; Gritti, Clémentine; Mitschang, Bernhard
    Two factors are crucial for the effective operation of modern-day smart services: Initially, IoT-enabled technologies have to capture and combine huge amounts of data on data subjects. Then, all these data have to be processed exhaustively by means of techniques from the area of big data analytics. With regard to the latter, thorough data refinement in terms of data cleansing and data transformation is the decisive cornerstone. Studies show that data refinement reaches its full potential only by involving domain experts in the process. However, this means that these experts need full insight into the data in order to be able to identify and resolve any issues therein, e.g., by correcting or removing inaccurate, incorrect, or irrelevant data records. In particular for sensitive data (e.g., private data or confidential data), this poses a problem, since these data are thereby disclosed to third parties such as domain experts. To this end, we introduce SMARTEN, a sample-based approach towards privacy-friendly data refinement to smarten up big data analytics and smart services. SMARTEN applies a revised data refinement process that fully involves domain experts in data pre-processing but does not expose any sensitive data to them or any other third-party. To achieve this, domain experts obtain a representative sample of the entire data set that meets all privacy policies and confidentiality guidelines. Based on this sample, domain experts define data cleaning and transformation steps. Subsequently, these steps are converted into executable data refinement rules and applied to the entire data set. Domain experts can request further samples and define further rules until the data quality required for the intended use case is reached. Evaluation results confirm that our approach is effective in terms of both data quality and data privacy.
  • Thumbnail Image
    ItemOpen Access
    Availability analysis of redundant and replicated cloud services with Bayesian networks
    (2023) Bibartiu, Otto; Dürr, Frank; Rothermel, Kurt; Ottenwälder, Beate; Grau, Andreas
    Due to the growing complexity of modern data centers, failures are not uncommon any more. Therefore, fault tolerance mechanisms play a vital role in fulfilling the availability requirements. Multiple availability models have been proposed to assess compute systems, among which Bayesian network models have gained popularity in industry and research due to its powerful modeling formalism. In particular, this work focuses on assessing the availability of redundant and replicated cloud computing services with Bayesian networks. So far, research on availability has only focused on modeling either infrastructure or communication failures in Bayesian networks, but have not considered both simultaneously. This work addresses practical modeling challenges of assessing the availability of large‐scale redundant and replicated services with Bayesian networks, including cascading and common‐cause failures from the surrounding infrastructure and communication network. In order to ease the modeling task, this paper introduces a high‐level modeling formalism to build such a Bayesian network automatically. Performance evaluations demonstrate the feasibility of the presented Bayesian network approach to assess the availability of large‐scale redundant and replicated services. This model is not only applicable in the domain of cloud computing it can also be applied for general cases of local and geo‐distributed systems.
  • Thumbnail Image
    ItemOpen Access
    Enhancing data trustworthiness in explorative analysis : an interactive approach for data quality monitoring
    (2024) Behringer, Michael; Hirmer, Pascal; Villanueva, Alejandro; Rapp, Jannis; Mitschang, Bernhard
    The volume of data to be analyzed has increased tremendously in recent years. In order to extract knowledge from this data, domain experts gain new insights with the help of graphical analysis tools for explorative analyses. Here, the reliability and trustworthiness of an exploratory analysis is determined by the quality of the underlying data. Existing approaches require manual testing to ensure data quality which is often neglected. This research aims to introduce a novel interactive approach for seamlessly integrating data quality considerations into the process of explorative data analysis conducted by domain experts. We derive requirements, conduct an extensive literature review, and develop an approach that efficiently combines stakeholders’ strengths, allowing unobtrusive data quality integration in interactive analysis. Our approach enhances trustworthiness due to unobtrusive monitoring of data quality within the context of explorative data analysis. Domain experts gain insights more reliably, bridging the gap between technical requirements and domain expertise. In conclusion, our research presents a promising solution for improving the reliability and trustworthiness of explorative data analysis, especially for domain experts who may lack technical knowledge. By seamlessly integrating data quality into the analytical process, we empower domain experts to extract valuable insights from the ever-increasing volume of data, thereby advancing the field of data-driven decision-making.