05 Fakultät Informatik, Elektrotechnik und Informationstechnik

Permanent URI for this collectionhttps://elib.uni-stuttgart.de/handle/11682/6

Browse

Search Results

Now showing 1 - 3 of 3
  • Thumbnail Image
    ItemOpen Access
    The lakehouse : state of the art on concepts and technologies
    (2024) Schneider, Jan; Gröger, Christoph; Lutsch, Arnold; Schwarz, Holger; Mitschang, Bernhard
    In the context of data analytics, so-called lakehouses refer to novel variants of data platforms that attempt to combine characteristics of data warehouses and data lakes. In this way, lakehouses promise to simplify enterprise analytics architectures, which often suffer from high operational costs, slow analytical processes and further shortcomings resulting from data replication. However, different views and notions on the lakehouse paradigm exist, which are commonly driven by individual technologies and varying analytical use cases. Therefore, it remains unclear what challenges lakehouses address, how they can be characterized and which technologies can be leveraged to implement them. This paper addresses these issues by providing an extensive overview of concepts and technologies that are related to the lakehouse paradigm and by outlining lakehouses as a distinct architectural approach for data platforms. Concepts and technologies from literature with regard to lakehouses are discussed, based on which a conceptual foundation for lakehouses is established. In addition, several popular technologies are evaluated regarding their suitability for the building of lakehouses. All findings are supported and demonstrated with the help of a representative analytics scenario. Typical challenges of conventional data platforms are identified, a new, sharper definition for lakehouses is proposed and technical requirements for lakehouses are derived. As part of an evaluation, these requirements are applied to several popular technologies, of which frameworks for data lakes turn out to be particularly helpful for the construction of lakehouses. Our work provides an overview of the state of the art and a conceptual foundation for the lakehouse paradigm, which can support future research.
  • Thumbnail Image
    ItemOpen Access
    Enhancing data trustworthiness in explorative analysis : an interactive approach for data quality monitoring
    (2024) Behringer, Michael; Hirmer, Pascal; Villanueva, Alejandro; Rapp, Jannis; Mitschang, Bernhard
    The volume of data to be analyzed has increased tremendously in recent years. In order to extract knowledge from this data, domain experts gain new insights with the help of graphical analysis tools for explorative analyses. Here, the reliability and trustworthiness of an exploratory analysis is determined by the quality of the underlying data. Existing approaches require manual testing to ensure data quality which is often neglected. This research aims to introduce a novel interactive approach for seamlessly integrating data quality considerations into the process of explorative data analysis conducted by domain experts. We derive requirements, conduct an extensive literature review, and develop an approach that efficiently combines stakeholders’ strengths, allowing unobtrusive data quality integration in interactive analysis. Our approach enhances trustworthiness due to unobtrusive monitoring of data quality within the context of explorative data analysis. Domain experts gain insights more reliably, bridging the gap between technical requirements and domain expertise. In conclusion, our research presents a promising solution for improving the reliability and trustworthiness of explorative data analysis, especially for domain experts who may lack technical knowledge. By seamlessly integrating data quality into the analytical process, we empower domain experts to extract valuable insights from the ever-increasing volume of data, thereby advancing the field of data-driven decision-making.
  • Thumbnail Image
    ItemOpen Access
    Research data management in simulation science : infrastructure, tools, and applications
    (2024) Flemisch, Bernd; Hermann, Sibylle; Herschel, Melanie; Pflüger, Dirk; Pleiss, Jürgen; Range, Jan; Roy, Sarbani; Takamoto, Makoto; Uekermann, Benjamin
    Research Data Management (RDM) has gained significant traction in recent years, being essential to allowing research data to be, e.g., findable, accessible, interoperable, and reproducible (FAIR), thereby fostering collaboration or accelerating scientific findings. We present solutions for RDM developed within the DFG-Funded Cluster of Excellence EXC2075 Data-Integrated Simulation Science (SimTech). After an introduction to the scientific context and challenges faced by simulation scientists, we outline the general data management infrastructure and present tools that address these challenges. Exemplary domain applications demonstrate the use and benefits of the proposed data management software solutions. These are complemented by additional measures for enablement and dissemination to foster the adoption of these techniques.