Browsing by Author "Oppold, Sarah"
Now showing 1 - 3 of 3
- Results Per Page
- Sort Options
Item Open Access Data engineering concepts, framework, and algorithms for personalized and transparent decision support(2025) Oppold, Sarah; Herschel, Melanie (Prof. Dr.)Machine learning models are commonly used for decision support even though they are far from perfect, e.g., due to bias introduced by imperfect training data or wrong feature selection. While efforts are made and should continue to be put into developing better models, we will likely continue to rely on imperfect models in many applications. In this thesis, we follow a rationale similar to the practices that emerged in medicine to develop decision support systems (DSS) that are demonstrably developed as responsibly as possible. Therefore, we present novel approaches to develop personalized and transparent data-driven decision support systems. Our fist contribution is a novel holistic system framework that offers transparent and personalized services tailored to user profiles to serve their best interest. Our framework personalizes the choice of model for individuals or groups of users based on metadata about datasets and machine learning models. Querying and processing these metadata ensures transparency by supporting various kinds of queries by different stakeholders. We discuss our framework in detail, show why existing solutions are inadequate, and highlight research questions that need to be addressed in the future. Based on a prototypical implementation, we showcase that even a baseline implementation of our framework supports the desired transparency and personalization. To address the problem of personalization, we present novel algorithms for dynamic, fair, and accurate decision support systems in our second contribution. We first propose a general procedure that combines context-sensitive algorithms for generating dynamic model ensembles necessary for personalization with (static) fair and accurate model ensembles, satisfying our multi-objective goal. We further introduce a family of algorithms, jointly called FALCES, representing different alternatives for our procedure. Using the FALCES algorithms, we evaluate our framework for dynamic, fair, and accurate model ensembles on synthetic and real data. The results show that despite the presence of biases, our algorithms outperform state-of-the-art fairness algorithms while maintaining acceptable accuracy. In order to provide personalized and transparent decision support, we have to capture and manage a large and diverse set of metadata on datasets, which are the foundation of decision support systems. Therefore, we present LiQuID in our third contribution. LiQuID is the first systematic and holistic metadata model for accountable datasets, i.e., datasets on which queries of an accountability workload can be answered. We evaluate LiQuID by comparing it to existing metadata standards from the responsible data analysis community and more matured disciplines. In a second evaluation, we compare LiQuID to a new workload for accountable datasets that we created based on a vast survey of the GDPR, an FTC report, and dataset expert interviews. Finally, we look at the devised system’s usefulness in achieving the desired goals. In our fourth contribution, we delve into one particular setting that has been claimed by research before, where transparency potentially serves to improve trust. We critically review the term “trust” to define a theoretical model for trust in data engineering. Based on this model, we describe a framework for trust in data engineering that integrates trust in the data engineering pipeline and serves as a guideline to develop a trust strategy. We also describe a general procedure to evaluate the effects of a transparency measure on trust. Finally, we apply and evaluate our methods on a credit scoring use case. Results show that transparency not necessarily increases trust, highlighting the importance of a more systematic study of the problem using our proposed methods. The research presented in this thesis has significantly contributed to our goal for decision support systems that are demonstrably developed as responsible as possible. This includes a general system framework for personalized and transparent decision support, novel optimization algorithms, accountability metadata models and a critical discussion on using transparency to foster trust.Item Open Access Metrics and algorithms for locally fair and accurate classifications using ensembles(2022) Lässig, Nico; Oppold, Sarah; Herschel, MelanieTo obtain accurate predictions of classifiers, model ensembles comprising multiple trained machine learning models are nowadays used. In particular, dynamic model ensembles pick the most accurate model for each query object, by applying the model that performed best on similar data. Dynamic model ensembles may however suffer, similarly to single machine learning models, from bias, which can eventually lead to unfair treatment of certain groups of a general population. To mitigate unfair classification, recent work has thus proposed fair model ensembles , that instead of focusing (solely) on accuracy also optimize global fairness . While such global fairness globally minimizes bias, imbalances may persist in different regions of the data, e.g., caused by some local bias maxima leading to local unfairness . Therefore, we extend our previous work by including a framework that bridges the gap between dynamic model ensembles and fair model ensembles. More precisely, we investigate the problem of devising locally fair and accurate dynamic model ensembles, which ultimately optimize for equal opportunity of similar subjects. We propose a general framework to perform this task and present several algorithms implementing the framework components. In this paper we also present a runtime-efficient framework adaptation that keeps the quality of the results on a similar level. Furthermore, new fairness metrics are presented as well as detailed informations about necessary data preparations. Our evaluation of the framework implementations and metrics shows that our approach outperforms the state-of-the art for different types and degrees of bias present in training data in terms of both local and global fairness, while reaching comparable accuracy.Item Open Access Visuelle Auswahl von Ontologiebestandteilen(2014) Oppold, SarahSemantische Daten gewinnen immer mehr an Bedeutung, doch es bleibt selbst für erfahrene Anwender schwer, SPARQL-Anfragen an RDF-Endpoints zu senden. Dies liegt nicht nur an der komplexen Syntax von SPARQL, sondern auch an den RDF-Daten selbst, denn die Ressourcen sind über URIs eindeutig identifizierbar. Diese sind aber für Menschen zu komplex um sie sich merken zu können und es ist auch nur selten möglich sie selbst zu erschließen. Deshalb wurde in dieser Arbeit zunächst ein Konzept für eine Visualisierung entwickelt, mit der man Ressourcen in RDF-Graphen finden kann. Dieses setzt Subgraphen ein, die der Nutzer inkrementell nach seinen Bedürfnissen erweitern kann um die Daten zu erforschen und einem Suchfeld um gezielt Ressourcen zu suchen. Das Konzept wurde in einem WPF-Prototypen umgesetzt und mit Hilfe einer kleinen Studie überprüft. Durch die Studie stellte sich heraus, dass der im Konzept entwickelte Subgraphen zu schnell zu komplex wird um von unerfahrenen Nutzern leicht verstanden zu werden. Doch das Prinzip wurde überwiegend als sinnvoll bewertet. Darüber hinaus wurden Möglichkeiten vorgestellt, wie man das Konzept an diesen Stellen anpassen könnte um den Bedürfnissen der Nutzer gerecht zu werden. Auch die Suchleiste wurde als sinnvoll, wenn auch verbesserbar erachtet.