Data engineering concepts, framework, and algorithms for personalized and transparent decision support

Thumbnail Image

Date

2025

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Machine learning models are commonly used for decision support even though they are far from perfect, e.g., due to bias introduced by imperfect training data or wrong feature selection. While efforts are made and should continue to be put into developing better models, we will likely continue to rely on imperfect models in many applications. In this thesis, we follow a rationale similar to the practices that emerged in medicine to develop decision support systems (DSS) that are demonstrably developed as responsibly as possible. Therefore, we present novel approaches to develop personalized and transparent data-driven decision support systems. Our fist contribution is a novel holistic system framework that offers transparent and personalized services tailored to user profiles to serve their best interest. Our framework personalizes the choice of model for individuals or groups of users based on metadata about datasets and machine learning models. Querying and processing these metadata ensures transparency by supporting various kinds of queries by different stakeholders. We discuss our framework in detail, show why existing solutions are inadequate, and highlight research questions that need to be addressed in the future. Based on a prototypical implementation, we showcase that even a baseline implementation of our framework supports the desired transparency and personalization. To address the problem of personalization, we present novel algorithms for dynamic, fair, and accurate decision support systems in our second contribution. We first propose a general procedure that combines context-sensitive algorithms for generating dynamic model ensembles necessary for personalization with (static) fair and accurate model ensembles, satisfying our multi-objective goal. We further introduce a family of algorithms, jointly called FALCES, representing different alternatives for our procedure. Using the FALCES algorithms, we evaluate our framework for dynamic, fair, and accurate model ensembles on synthetic and real data. The results show that despite the presence of biases, our algorithms outperform state-of-the-art fairness algorithms while maintaining acceptable accuracy. In order to provide personalized and transparent decision support, we have to capture and manage a large and diverse set of metadata on datasets, which are the foundation of decision support systems. Therefore, we present LiQuID in our third contribution. LiQuID is the first systematic and holistic metadata model for accountable datasets, i.e., datasets on which queries of an accountability workload can be answered. We evaluate LiQuID by comparing it to existing metadata standards from the responsible data analysis community and more matured disciplines. In a second evaluation, we compare LiQuID to a new workload for accountable datasets that we created based on a vast survey of the GDPR, an FTC report, and dataset expert interviews. Finally, we look at the devised system’s usefulness in achieving the desired goals. In our fourth contribution, we delve into one particular setting that has been claimed by research before, where transparency potentially serves to improve trust. We critically review the term “trust” to define a theoretical model for trust in data engineering. Based on this model, we describe a framework for trust in data engineering that integrates trust in the data engineering pipeline and serves as a guideline to develop a trust strategy. We also describe a general procedure to evaluate the effects of a transparency measure on trust. Finally, we apply and evaluate our methods on a credit scoring use case. Results show that transparency not necessarily increases trust, highlighting the importance of a more systematic study of the problem using our proposed methods. The research presented in this thesis has significantly contributed to our goal for decision support systems that are demonstrably developed as responsible as possible. This includes a general system framework for personalized and transparent decision support, novel optimization algorithms, accountability metadata models and a critical discussion on using transparency to foster trust.

Description

Keywords

Citation

Endorsement

Review

Supplemented By

Referenced By