Reproducible modeling and uncertainty quantification of sparse and variable data

Thumbnail Image

Date

2025

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Systems biology and systems medicine aim to understand complex relationships in living systems. These living systems exhibit considerable variability due to genetic predispositions, epigenetic modifications, and stochastic and environmental effects. Further, observations are sparse in many biomedical areas due to complex measurements, costs, and ethics. Addressing system complexity paired with sparsity and data variability requires reliable modeling and uncertainty quantification methods. Here, modeling enables understanding and testing hypotheses about systems. At the same time, uncertainty quantification sheds light on the reliability of model predictions and the prediction variability for a certain credibility level.

This thesis contributes to a systematic biomedical understanding and sound statistical analysis in three major aspects: (i) The Bayesian workflow BayModTS was developed to process and compare sparse and variable time series data and applied to three hepatic datasets. BayModTS is a Findable, Accessible, Interoperable, and Reproducible (FAIR) workflow utilizing the retarded transient functions of \citet{Kreutz.2020} as a universal simulation model. It can be used to statistically test whether different dynamics stem from the same data-generating process and to process discrete and variable time series data into continuous, uncertainty-equipped functions. (ii) Using sparse data, deterministic and stochastic modeling is used to characterize the tumor-suppressor protein DLC1 in the Epithelial-Mesenchymal Transition (EMT). We showed that loss of DLC1 increases EMT plasticity, a crucial feature in cancer metastasis. The modeling results are confirmed by experimental data, showing a partial EMT phenotype in DLC1-depleted cells. Further, narrow posterior marginals validate our model and the uniqueness of the estimated parameters. (iii) Bayesian estimation shows that papers with reproducible systems biology models get more citations. This confirms that reproducibility is valuable, as ten years after introducing the Systems Biology Markup Language (SBML), papers with reproducible models get more citations than papers with non-reproducible models. A heatmap visualization scheme for Bayesian estimation multi-group comparisons was developed to assist in visual interpretation and pattern identification of the results. The heatmap contains the credibility of group differences, thereby capturing the uncertainty in the data. We applied the heatmap visualization to verify an increased citation count for sub-periods until 2020.

All methods and models are developed reproducibly, using state-of-the-art formats and providing all code in public repositories with detailed reproduction instructions. In short, statistical methods are paired with FAIR modeling approaches to gain insight and provide trustworthy predictions into processes where only limited data is available.

Description

Keywords

Citation

Endorsement

Review

Supplemented By

Referenced By