Creating a robust and effective feature selection pipeline in the clinical setting : how to leverage information from multiple modalities to identify features that are health condition-sensitive and -specific?

Richter, Vanessa

Creating a robust and effective feature selection pipeline in the clinical setting : how to leverage information from multiple modalities to identify features that are health condition-sensitive and -specific?

Files

Thesis_Vanessa_Richter_CL.pdf (8.33 MB)

Date

2023

Authors

Richter, Vanessa

Abstract

Utilizing computer vision and speech signal processing to assess neurological and psychiatric conditions has the potential to help detecting diseases or monitoring their progression earlier and more accurately. However, retrieving the required information from speech and facial modalities presents the challenge of finding features that generalize across studies with high sensitivity and specificity. A major task in finding such features is dealing with overfitting to data biases in small sample sizes and redundancy in the analysis of high-dimensional feature sets. It is also critical to ensure interpretability of these methods since the results of health screening tools must be explainable to clinicians and patients. In this thesis, we present a transparent feature selection pipeline that specifically addresses demographic biases and feature redundancy. Our method provides interpretable insights by quantifying feature contributions to classification results using Shapley values. More specifically, we assessed age trends of the entire healthy control cohort and corrected the feature values based on the determined age coefficients. Sex-specific z-scoring was used to account for differences between males and females. To address feature redundancy, we used hierarchical clustering to group features into sensible domain-specific clusters, such as voice quality, jaw movement, or mouth symmetry. These clusters together with feature effect sizes were used in the classification step to select only the most salient features as input to the classifier. Finally, Shapley values were calculated to unwrap model decisions and evaluate the contribution of individual features. We used datasets on neurological (bulbar pre-symptomatic and bulbar symptomatic ALS) and mental (depression and schizophrenia) diseases as well as a healthy control dataset. The data was collected in a real-world scenario, where participants engaged with a virtual agent that guided the participants through a set of tasks. We apply the presented feature selection method including Shapley-based analyses on these datasets. Our analysis provides valuable insights into feature contribution among binary and multiclass classification experiments and reveals shared characteristics across disorders.

URI

http://nbn-resolving.de/urn:nbn:de:bsz:93-opus-ds-141587
http://elib.uni-stuttgart.de/handle/11682/14158
http://dx.doi.org/10.18419/opus-14139

Collections

05 Fakultät Informatik, Elektrotechnik und Informationstechnik

Full item page

Creating a robust and effective feature selection pipeline in the clinical setting : how to leverage information from multiple modalities to identify features that are health condition-sensitive and -specific?

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By