/ Published online: 13 July 2023 Computational Geosciences (2023) 27:663–686 https://doi.org/10.1007/s10596-023-10228-z ORIG INAL PAPER A surrogate-assisted uncertainty-aware Bayesian validation framework and its application to coupling free flow and porous-medium flow Farid Mohammadi1 · Elissa Eggenweiler2 · Bernd Flemisch1 · Sergey Oladyshkin1 · Iryna Rybak2 · Martin Schneider1 · Kilian Weishaupt1 Received: 10 June 2022 / Accepted: 6 June 2023 © The Author(s) 2023 Abstract Existing model validation studies in geoscience often disregard or partly account for uncertainties in observations, model choices, and input parameters. In this work, we develop a statistical framework that incorporates a probabilistic modeling technique using a fully Bayesian approach to perform a quantitative uncertainty-aware validation. A Bayesian perspective on a validation task yields an optimal bias-variance trade-off against the reference data. It provides an integrative metric for model validation that incorporates parameter and conceptual uncertainty. Additionally, a surrogate modeling technique, namely Bayesian Sparse Polynomial Chaos Expansion, is employed to accelerate the computationally demanding Bayesian calibration and validation. We apply this validation framework to perform a comparative evaluation of models for coupling a free flow with a porous-medium flow. The correct choice of interface conditions and proper model parameters for such coupled flow systems is crucial for physically consistent modeling and accurate numerical simulations of applications. We develop a benchmark scenario that uses the Stokes equations to describe the free flow and considers different models for the porous-medium compartment and the coupling at the fluid–porous interface. These models include a porous-medium model using Darcy’s law at the representative elementary volume scale with classical or generalized interface conditions and a pore-network model with its related coupling approach. We study the coupled flow problems’ behaviors considering a benchmark case, where a pore-scale resolved model provides the reference solution. With the suggested framework, we perform sensitivity analysis, quantify the parametric uncertainties, demonstrate eachmodel’s predictive capabilities, andmake a probabilistic model comparison. Keywords Model validation · Uncertainty quantification · Free flow · Porous-medium flow · Interface conditions Mathematics Subject Classification (2010) 65C20 · 62F15 · 65C05 · 35Q35 · 76D07 · 76M12 · 76S05 1 Introduction Geoscientists use modeling as an essential instrument to study the causes and effects of their problems, with an increasing level of detail and complexity. Over the last cen- tury, computational modeling in geoscience, especially in porous media research, has witnessed tremendous improve- ment. After decades of development, the state-of-the-art simulators can now solve coupled partial differential equa- tions governing the complex subsurface multiphase flow B Farid Mohammadi farid.mohammadi@iws.uni-stuttgart.de Extended author information available on the last page of the article behavior within a practically large spatial and temporal domain. Given the importance of computational modeling, assessing the models’ reliability is paramount to engineer- ing designers, managers, public officials, and those affected by the decisions based on the predictions. For this reliabil- ity assessment, model validation is commonly used, among other complementary measures, to assess the model per- formance. The relevant hypotheses regarding assessing the reliability of a model could be building up or losing trust in its generated simulations. In terms of validation, the hypoth- esis is whether the model can satisfactorily represent the real system of interest. The word “validation” is commonly used to refer to sim- ple comparisons between model outputs and experimental 123 http://crossmark.crossref.org/dialog/?doi=10.1007/s10596-023-10228-z&domain=pdf http://orcid.org/0000-0003-2387-4694 http://orcid.org/0000-0001-7966-9505 http://orcid.org/0000-0003-4935-3096 Computational Geosciences (2023) 27:663–686 data or any reference data. Usually, these comparisons con- stitute plotting the model results against data on the same axes to assess agreement or lack thereof visually. However, they are clearly insufficient as a basis for making decisions regardingmodel validity. These naive comparisons often dis- regard or only partly account for existing uncertainties in the experimental observations or the model input parameters. It is impossible to appropriately determine whether the model and data agree without accounting for these uncertainties. Moreover, for assessing the physical phenomena in ques- tion, several representations, i.e., models, might exist with different approaches and assumptions to analyze the occur- ring processes. Therefore, a significant research challenge is accurately assessing competing modeling concepts and val- idating the corresponding computational models against an experiment or a reference solution. In the case of the multi- model comparison, the validation hypothesis is which model within the pool of available models can represent the real- ity, i.e., observed values in the experiments or the reference data resulting from a detailed simulation. This may be ana- lyzed as a validation benchmark. Following the guidelines for validation benchmarks in [1, 2], the most important sci- entific question to be addressed here is how to compare the computational models quantitatively. Each validation benchmark must declare one or more methods that should be employed for quantitative comparison. The development of such validation metrics is an active field of research. Available approaches for quantitative model validation are based on statistical confidence intervals [3], computing dis- tance between the model prediction and experimental data by computing the area metric [4, 5], normalizing residuals [6], classical statistics-based hypothesis testing [7], Bayesian hypothesis testing [8–12], and reliability analysis-based tech- niques [13–15]. [16], and [17] investigated several of these validation approaches in detail and discussed their practical implications in engineering.While being potentially applica- ble to many research areas, rigorously developed validation metrics have been applied to some of them only, such as com- putational fluid dynamics [3], computational solidmechanics [10] or heat transfer [18]. In this work, we developed an uncertainty-aware valida- tion framework that includes a rigorous uncertainty quan- tification of models and provides validation metrics for comparing the competing models. This framework employs the Bayesian probability theory, which offers a robust frame- work for characterizing scientific inference since its simple concept lies in the fact that rational belief comes in degrees that can be measured in terms of probabilities. Moreover, a Bayesian perspective on a validation task yields an optimal bias-variance trade-off against the experimental data or ref- erence solution. It provides an integrative metric for model validation that incorporates parameter and conceptual uncer- tainty [19–21]. This problem can lead to a validation benchmark. Several model ranking methods exists to be used in the multi-model comparison problemswhich allow for statisticalmodel selec- tion and averaging. The most prominent example might be Bayesian model selection (BMS) [22, 23], in which model probabilities are used to express uncertainty between models in terms of how likely it is that a certain candidate model generated the observed data. It employ the Bayesian model evidence (BME) as a validation score, indicating the quality of the analyzed physical models against the available exper- imental data or finely resolved reference solution. BMS is often the first choice to deal with conceptual uncertainty over many disciplines, such as cosmology (e.g., [24]), economy (e.g., [25]), ecology (e.g., [26]), hydrology (e.g., [20, 27]), groundwater modeling (e.g., [28]), hydrogeophysics (e.g., [29]), soil-plant modeling (e.g., [21, 30]) and reactive trans- port modelling (e.g., [31]). The probabilistic nature of the Bayesian technique requires propagating the parametric uncertainty through all competingmodels – i.e., a significant number ofmodel evalu- ations – to reach statistical convergence. In practice, however, the computational complexity of the underlying computa- tional model and the total available computational budget severely restrict the number of evaluations one can perform. This challenge renders the brute-force computation of the BME value, required for the Bayesian validation framework (BVF), infeasible. We replace each original computational model with its easy-to-evaluate surrogate in the Bayesian analysis. A surrogate-assisted Bayesian analysis has been applied tomany applications, including hydrology, e.g., [32], sediment transport, e.g., [33, 34], processes in subsurface reservoirs, e.g., [35–37], subsurface flow models, e.g., [38]. A surrogate model’s primary goal is to replicate the behavior of the underlying physical models from a limited set of runs without sacrificing accuracy. For constructing a surrogate, the computational model needs to be evaluated using those sets of modeling parame- ters, out of various possibilities, that cover the parametric space as well as possible. The polynomial chaos expan- sion (PCE) [39] or its extension toward arbitrary polynomial chaos expansion (aPCE) [40] is a well-known and rig- orous approach to providing functional representations of stochastic quantities. However, not all expansion terms are relevant for representing the underlying physical processes, and employing the concept of sparsity can lead to zero values for many coefficients in the expansion [41]. Many math- ematical approaches exist when dealing with a regression problem, such as PCE representation, that lead to a sparse solution. These approaches have led to the emergence of numerous sparse solvers in the compressed sensing [42], as well as in the sparse PCE. In [43] a comprehensive survey of the proposed solvers in the context of PCE is provided. Here, we employ Bayesian sparse aPCE denoted as BsaPCE, 123 664 Computational Geosciences (2023) 27:663–686 which is an extension of aPCE within a Bayesian frame- work using a Bayesian sparse learning method [44]. BsaPCE not only identifies the expansion terms, which capture the physical model’s relevant features, but it can also provide a probabilistic prediction, i.e., a prediction with the associated uncertainty. This prediction uncertainty can be used as the expected surrogate uncertainty when replacing the original computational model with a possibly less accurate surrogate. The surrogate-assisted Bayesian validation framework presented here has the following novel features. Firstly, it benefits from a two-stage approach. In the first stage, a cali- bration is performed to update the distribution of parameters via a Bayesian inference. This is followed by a validation stage in which the model performance is investigated using the information from the calibration in the form of parameter posterior distribution and compared with a new data set. Fol- lowing this approach, we seek to reduce the sensitivity of the model comparison results due to the choice of prior distribu- tions. Secondly, a new surrogatemodel is trained based on the the inferred posterior distribution of parameters to be used in the validation. We strongly believe that this will improve the surrogate’s predictive abilities as the surrogate model is curated more towards the region in the parameter space that is believed to produce results close to the reference solution. Thirdly, we would like to emphasize that the proposed work- flow leverages a fully Bayesian approach. This means that even the surrogate modeling via Polynomial Chaos Expan- sion follows a Bayesian approach, as discussed earlier. In this study, we apply the proposed surrogate-assisted Bayesian validation framework to coupling free flow and porous-mediumflow.Coupled free-flow and porous-medium systems play a significant role in many industrial, envi- ronmental, and biological settings, e.g., fuel cells, water flows in karst aquifers, blood flows in vessels, and living tissues. Flow interaction between the free-flow region and the porous-medium domain is highly involved and strongly interface-driven. Therefore, a physically consistent descrip- tion offlowprocesses in thewhole coupled system, especially near the interface, is crucial for accurate numerical simula- tions of applications. A lot of effort has been made during the last decades in mathematical modeling and analysis of such coupled flow systems, and several coupling strategies have been proposed, e.g., [45–48, 55, 58, 63]. The possi- bilities to conceptualize these coupling conditions could be regarded as conceptual uncertainty. This conceptual uncer- tainty is mainly related to the description of processes in the porous medium and near the interface, for which, different mathematical models and coupling strategies are considered. We use the literature’s most widely studied coupled flow problem, namely the Stokes–Darcy problem. In this set- ting, the free-flow conceptualization is based on the Stokes equations for all discussed models. However, the way these models simulate the fluid flow in the porous medium and the set of coupling conditions imposed on the fluid–porous interface varies. Apart from the conceptual uncertainty, each computational model contains parametric uncertainty, such as material parameters or interface location that must also be rigorously addressed. To the best of our knowledge, no rig- orous uncertainty-aware validation for this application has been performed so far. The rest of this paper is organized as follows. In Section 2, we introduce the Bayesian validation framework and discuss how it can be accelerated by means of surrogate modeling. Section 3 presents themathematical and computational mod- els that are going to be evaluated by means of the proposed benchmark scenario. The benchmark scenario is described in Section 4, while Section 5 is devoted to a discussion of the application of the Bayesian model selection. In Section 6, we present a summary and conclusion to our research. 2 Bayesian validation framework The current section introduces the BVF for comparing and validating computational models. The performance of these models is compared to a reference, either observed data from experiments or highly-resolved reference models. The ben- efits of this comparison are twofold. First, this evaluates the strengths and weaknesses of competing modeling concepts. Second, the predictive ability of each computational model is assessed. Furthermore, we update the prior belief in the predictive capability of the models based on Bayesian notions. The resulting so-called posterior belief is expressed in terms of probabilities. The Bayesian approach in the validation task allows us to include possible sources of errors that can lead to inevitable uncertainties. The BVF requires propagation of the parametric uncertainty through the given computa- tionally demanding models. This propagation renders the analysis intractable, as it demands many model evaluations. To circumvent this problem,we employ a surrogatemodeling technique to offset BVF’s computational time. 2.1 Bayesianmodel comparison The topic of quantitative model comparison has received and continues to receive considerable attention in the field of statistics. There exist several multimodel comparison frame- works related to these model rating methods that allow for statistical model selection and averaging, e.g., [69]. The most common approach is Bayesian model selection [22, 23]. BMS is grounded on Bayes’ theorem, which com- bines a prior belief about the efficiency of each model with its performance in replicating a common observation data set. Its procedure for model comparison entails principled and general solutions to the trade-off between parsimony 123 665 Computational Geosciences (2023) 27:663–686 and goodness-of-fit. Moreover, BMS is a formal statistical approach that allows comparing alternative conceptual mod- els, testing their adequacy, combining their predictions into a more robust output estimate, and quantifying conceptual uncertainty’s contribution to the overall prediction uncer- tainty. BMS can be regarded as a Bayesian hypothesis testing framework, combining the idea of classical hypothesis test- ing with the ability to examine multiple alternative models against each other in a probabilistic manner. It returns the so-called model weights [19] representing posterior proba- bilities for eachmodel to be themost appropriate from the set of proposed competing models. Thus, the computed model weights provide a quantitative ranking for the competing con- ceptual models. Let us consider Nm competing computational models Mk , each with an uncertain parameter vector θk of length Nk , yielding a quantity Q of interest in a physical space of x , y and z and for a time stamp of t . The model weights are given by Bayes’ theorem, which can be cast for a set of Mk competing models as P(Mk |Y) = p(Y|Mk)P(Mk) ∑Nm i=1 p(Y|Mi )P(Mi ) , (1) where P(Mk) denotes the prior probability of the model, also known as the subjective credibility that model Mk could be the most plausible model in the set of models before any comparison with observed data have been made. The term p(Y|Mk) is the Bayesian model evidence (BME), also known as marginal likelihood, of the model Mk . Bayes’ the- orem closely follows the principle of parsimony or Occam’s razor [86], in that the posteriormodelweights P(Mk |Y) offer a compromise between model complexity and goodness of fit, known as the bias-variance trade-off [19]. The model weights, P(Mk |Y), can be interpreted as the Bayesian prob- ability of the individual models to be the best representation of the system from the pool of competing models. Hoeting et al. [23] proposed that a ‘reasonable, neutral choice’ could be equally likely priors, i.e., P(Mk) = 1/Nm , in case of paucity of prior knowledge regarding the merit of the different models under consideration. The denomi- nator in Eq. 1 is the normalizing constant of the posterior distribution of the models and can simply be obtained by determination of the individual weights. Since all model weights are normalized by the same constant, this normal- izing factor could even be neglected. Thus, the weights P(Mk |Y) of the individual model Mk against other models can be represented by the proportionality P(Mk |Y) ∝ p(Y|Mk)P(Mk). (2) The BME term p(Y|Mk) quantifies the likelihood of the observed data based on the prior distribution of the parame- ters. It can be computed by integrating the likelihood term in Bayesian theorem [87] over the parameter space �k of the model Mk : p(Y|Mk) = ∫ �k p(Y|Mk, θk)P(θk |Mk)dθk, (3) with θk being the parameter vector from the parameter space �k of model Mk . The term P(θk |Mk) is the correspond- ing prior distribution of parameters θk for the model Mk . Assuming that the measurement errors follow a Gaussian distribution, the likelihood or probability of the parameter set θk of model Mk to have generated the observation data with the independent realization of Y = (Y1, ...,YN )� is represented by p(Y|Mk, θk) := N∏ i=1 p(Yi |Mk, θk) = 1 √ (2π)N det� × exp ( − (Mk(θk)−Y)T�−1(Mk(θk)−Y) 2 ) , (4) where � denotes the covariance matrix, which includes all error sources to be explained in Section 4.2. To obtain the BME values for each competing model, we estimate the inte- gral in Eq. 3 using a brute-forceMonte Carlo integration [94] avoiding unnecessary assumptions. For more details on the properties of BME and a comparison of various techniques to estimate this term, the reader is referred to [20] and [85]. The ratio of BME values for two alternative models is defined as the Bayes Factor BF(Mk, Ml), which is a key component in Bayesian hypothesis testing framework intro- duced in [88]: BF(Mk, Ml) = P(Mk |Y) P(Ml |Y) P(Ml) P(Mk) = p(Y|Mk) p(Y|Ml) . (5) It quantifies the evidence (literally, as in Bayesian model evidence) of the hypothesis Mk against the null-hypothesis Ml . Stated differently, the Bayes Factor BF(Mk, Ml), can be interpreted as the ratio between the posterior and prior odds of model Mk being the more plausible one in comparison to the alternative model Ml [87]. Jeffreys [88] provides a rule of thumb for the interpretation of Bayes Factor. The grades of evidence are summarized in Table 1. Following this sug- gestion, a Bayes Factor which lies between 1 and 3 indicates an evidence in favor of Mk that is ‘not worth more than a bare mention’, a factor of up to 10 represents ‘substantial’ evidence, and a factor between 10 and 100 can be regarded 123 666 Computational Geosciences (2023) 27:663–686 Table 1 Interpretation of Bayes Factor in favor of model Mk according to [88] Bayes Factor (BF) Interpretation 1 – 3 anecdotal evidence 3 – 10 substantial evidence 10 – 100 strong evidence > 100 decisive evidence a ‘strong’ evidence. Finally, a Bayes Factor greater than 100 admits ‘decisive’ evidence, i.e., it can be used as a threshold to reject models based on poor performance in comparison to the best performingmodel in the set of considered models. 2.2 Accelerating the analysis via surrogate modeling As stated earlier, BVF requires uncertainty propagation through each competing model, demanding a significant number ofmodel evaluations to yield statistical convergence. In practice, however, the computational complexity of the underlying computational models and the total available computational budget severely restrict the number of eval- uations one can carry out. In such situations, the Bayesian analysis estimations lack sufficient trust, as the limited num- ber of model evaluations can yield additional uncertainty. To tackle this challenge, we replace each competing model’s response in the Bayesian validation framework with an easy- to-evaluate surrogate representation using the theory of PCE introduced in [39]. The PCE representation of the model Mk provides the dependence of the computational model Mk on the uncertain model’s parameters θk using projection onto an orthonor- mal polynomial basis [95]. It could be also seen as a linear regression that includes linear combinations of a fixed set of nonlinear functions with respect to the input variables, known as polynomial basis function Mk(x, y, z, t, θk)≈ ∑ α∈A cα(x, y, z, t)�α(θk). (6) Here, x, y, z, t are the spatial and temporal components of the quantity of interest, θk is the vector of the Nk uncer- tain parameters of model Mk , cα(x, y, z, t) ∈ R are the corresponding expansion coefficients that are functions of space and time, and �α(θk) denotes multivariate polynomi- als orthogonal with respect to a multi-index α. The latter represents the combinatoric information how to enumerate all possible products of Nk individual univariate basis func- tions with respect to the total degree of expansions less or equal to polynomial degree d [89]: ANk ,d = {α ∈ N Nk : |α| ≤ d} , card ANk ,d ≡ P = ( Nk + d d ) . (7) Themultivariate polynomials�α(θk) are comprised of the tensor product of univariate polynomials �α(θk) := Nk∏ i=1 ψ(i) αi (θk,i ) , (8) where the univariate orthonormal polynomials ψ (i) αi (θk,i ) must satisfy 〈ψ(i) j (θk,i ), ψ (i) l (θk,i )〉 := ∫ �k,i ψ (i) j (θk,i )ψ (i) l (θk,i ) f�k,i (θk,i )dθk,i = δ jl . (9) Here, i represents the input variable with respect to which the polynomials are orthogonal as well as the corresponding polynomial family, j and l are the corresponding polynomial degree, f�k,i (θk,i ) is the i th-input marginal distribution and δ jl is the Kronecker delta. We use an arbitrary version of PCE, namely aPCE, introduced in [40], that can operate with probabilitymeasures thatmay be implicitly and incompletely defined via their statistical moments. Using aPCE, one can build the multivariate orthonormal polynomials even in the absence of the exact probability density function f�k (θ). The main task of the surrogate representation of model Mk(x, y, z, t, θk) in Eq. 6 is to compute the coefficients cα . However, not all coefficients could be relevant for such a sur- rogate. Therefore, we employ the Bayesian sparse learning method [41] using a fast marginal likelihood maximization algorithm [44]. Doing so, we sequentially identify the rel- evant predictors (expansion terms) that capture the most significant features of the physical model. We denote this extension of aPCE as Bayesian sparse arbitrary polynomial chaos (BsaPCE) representation [96]. The posterior distribu- tion of the expansion coefficients, conditioned on the model responses Y resulting from the training sets X, is given by the combination of a Gaussian likelihood and a Gaussian prior distribution over the unknown expansion coefficients c according to Bayes’ rule. Then, the posterior of the expan- sion coefficients given the model responses Y and values of hyper-parameters α and β describing the Gauss process [105], can take the following form p(c|Y,α, β) = p(Y|X, c, β)p(c|α) p(Y|X,α, β) , (10) 123 667 Computational Geosciences (2023) 27:663–686 which is also Gaussian defined by N (c|μ,�) with μ = β���Y , � = ( A + ��β� )−1 . (11) Here, � is the design matrix of size E × N with elements �ni = ψi (xn), where E represents the number of model evaluations using the training samples, and A = diag(αi ). The values of α and β can be determined via type-II max- imum likelihood [91]. Having found values α∗ and β∗ for the hyperparameters that maximize the marginal likelihood, one can evaluate the predictive distribution over Y for a new input x by p(Y|x,X,Y,α∗, β∗) = ∫ p(Y|x, c, β∗)p(c|X,Y,α∗, β∗)dc = N (Y|μ��(x), σ 2(x)) . (12) The predictive mean is given by Eq. 11 with c set to the posterior mean μ, and the variance of the predictive distri- bution is given by σ 2(x) = (β∗)−1 + �(x)���(x) , (13) where � is calculated by Eq. 11 in which α and β set to their optimized values α∗ and β∗. For the latter, a separate hyperparameter αi is assigned to each weight parameter ci . 3 Mathematical and computational models In this section, we provide the assumptions on the cou- pled flow system of interest in this work and introduce the flow models for which the Bayesian validation framework is applied in Section 5. From a pore-scale perspective, we consider a two- dimensional flow domain �flow consisting of the free-flow domain �ff and the pore space �pore of the porous medium. The porous-medium domain �pm has a periodic structure composed by the repetition of the representative elementary volume (REV) (scaled unit cell) Y = (0, ) × (0, ), where is the microscopic length scale (Fig. 1, top). From a macro- scopic point of view, the coupled domain � = �ff ∪ �pm comprises the free-flow region �ff and the porous-medium domain �pm, separated by a sharp fluid–porous interface � (Fig. 1, bottom). We consider isothermal single-phase steady-state flow at lowReynolds numbers. The samefluid occupies the free-flow domain and fully saturates the porous medium. This fluid is supposed to be incompressible and to have constant viscosity. We consider a non-deformable porous medium leading to constant porosity. Fig. 1 Geometrical setting at the pore scale (top) and REV scale (bottom) 3.1 Pore-scale resolvedmodel At the pore scale, fluid flow in the whole flow domain �flow is governed by the Stokes equations, ∇·v = 0, −∇·T(v, p) = 0 in �flow, (14) completedwith the no-slip condition on the boundary of solid inclusions v = 0 on ∂�flow \ ∂�, (15) and appropriate conditions on the external boundary ∂�. Here, v and p denote the fluid velocity and pressure, T(v, p) = μ∇v − pI the stress tensor, I the identity tensor and μ the dynamic viscosity. Resolving pore-scale information is computationally expensive for practical applications. Therefore, REV-scale model formulations, which accurately reflect the pore-scale flow processes, are often preferred and are studied in this manuscript. The pore-scale resolved model Eqs. 14–15 is used only as a reference for the model validation purposes. A finite-volume scheme on staggered grids, also known as MAC scheme [92], is used to discretize the pore-scale model Eqs. 14–15. 3.2 Subdomainmodels In this study, we consider two different types of coupled models, for which the Stokes equations are used in the free- flow region �ff but the porous domain �pm is treated by different modeling concepts. The first type of model relies on the REV-scale description of the porous-medium domain using Darcy’s law, whereas the second type of model follows 123 668 Computational Geosciences (2023) 27:663–686 a hybrid-dimensional approach, where a lower-dimensional pore-network model (PNM) is used to describe the fluid flow in the porous domain [68, 73]. 3.2.1 Free-flowmodel As a common feature, both coupled models (REV-scale model, pore-networkmodel) contain the incompressible, sta- tionary Stokes equations for the description of fluid flow in the free-flow domain ∇·vff = 0, −∇·T(vff , pff) = 0 in �ff , (16) where vff is the fluid velocity and pff is the fluid pressure. For discretization of the Stokes system Eq. 16, the same MAC scheme as for model Eqs. 14–15 is employed. 3.2.2 REV-scale porous-mediummodel At the REV-scale, fluid flow through the porous medium is described by the Darcy flow equations ∇·vpm = 0, vpm = −K μ ∇ ppm in �pm, (17) where vpm is theDarcyfluid velocity, ppm is the fluid pressure andK is the intrinsic permeability tensor,which is symmetric, positive definite, and bounded. Equation 17 are discretized with a vertex-centered finite-volume scheme, also known as boxmethod [93]. This scheme has the advantage that degrees of freedom are naturally located at the interface and there- fore directly allow the calculation of interfacial quantities (see [111] for more details) appearing in the coupling condi- tions Eqs. 23–25 below. 3.2.3 Pore-network porous-mediummodel Pore-network models [84] consider a simplified yet equiva- lent representation of the porous geometry by separating the void space into larger pore bodies connected by narrow pore throats. Despite their low computational demand, a rather high degree of pore-scale accuracy can be achieved [78]. PNMs can also be combined with modeling approaches on different scales [79], such as Darcy-type continuum models [80–82] or free-flowmodels [83].Weishaupt et al. [68] devel- oped a monolithic approach to couple a pore-network model with a (Navier–)Stokes model, which was later improved by considering pore-scale slip [73]. For the PNM,we require the conservation ofmass for each pore body i (the intersection of two or more pore throats): ∑ j Qi j = 0, Qi j = gi j (pi − p j ). (18) Fig. 2 Schematic contribution to total conduction for the PNM. Throat i j connects the pore bodies i and j at the centers of which the pressures pi and p j are defined. gt,i j is the throat conductance valid for the region marked in light red. gp,i and gp, j are the conductances defined for the pore body halves marked in teal Here, Qi j is the discrete volume flow rate in a throat con- necting the pore bodies i and j , and the pressures defined at the centers of the pore bodies i and j are given by pi and p j (Fig. 2). Equation 18 represents a finite-volume discretiza- tion scheme with a two-point flux approximation, see [73, 74] for further details. The total conductance gi j is deter- mined by the pore throat geometry and the fluid properties. Considering the pressure losses both within the pore bodies and throats, we use gi j = ( g−1 t,i j + g−1 p,i + g−1 p, j )−1 , (19) where gt,i j is the conductance of a throat i j while gp,i and gp, j are the conductances of the adjacent pore-body halves (Fig. 2). Simple analytical expressions for gi j are available in the literature [77] for certain geometries. Usually, we deter- mine gi j via numerical upscaling [75], whereas for this study, we consider it to be an additional uncertain parameter. In the following, we only refer to gp,i , as for the given geometry gp,i = gp, j for interior throats. At interface throats, one of the half-pore-body conductance is zero. 3.3 Coupling concepts A variety of REV-scale coupling concepts for the Stokes– Darcy system Eqs. 16–17 is available in the literature. In this paper, we consider the most widely used set of interface conditions, based on the Beavers–Joseph condition, and the recently developed generalized conditions [45]. If the PNM Eq. 18 is used in the porous medium, separate coupling con- ditions, suitable for the pore-scale description of interface exchange processes, must be considered. 123 669 Computational Geosciences (2023) 27:663–686 3.3.1 Classical coupling conditions (REV-scale model) The most commonly used interface conditions are the con- servation of mass vff ·n = vpm·n on �, (20) the balance of normal forces −n·T(vff , pff)n = ppm on �, (21) and the Beavers–Joseph condition [50] for the tangential component of velocity (vff − vpm)·τ − √ K αBJ τ ·∇vffn = 0 on �. (22) Here, αBJ > 0 is the Beavers–Joseph parameter, n is the normal unit vector on � pointing outward from the porous medium, τ is a tangential unit vector on� and √ K = √ τ ·Kτ (Fig. 1). The Beavers–Joseph interface condition Eq. 22 was pos- tulated for flows parallel to the interface [50]. In [45, 52] it is shown that this condition is unsuitable for arbitrary flow directions to the porous bed, however, it is routinely applied in the literature to multidimensional flows [64, 65]. The Stokes–Darcy problem Eqs. 16–17, 20–22 contains uncer- tain model parameters. In this paper, we focus on four uncertain parameters: the exact interface position γ , the Beavers–Joseph slip coeffi- cient αBJ, the permeability tensor K = kI and the maximum boundary velocity at the inflow boundary V top (Fig. 4). The exact location of the fluid–porous interface for REV-scale models is not known a priori. In the literature, there are sev- eral recommendations to impose the sharp interface directly on the top of solid inclusions in case of circular grains [50, 51, 62]. For other pore geometries, especially in the case of anisotropic media, the problem of optimal interface location is still open. However, the exact interface location is very important, especially for microfluidic models [76] which are routinely used in experiments. Another uncertain parameter is the Beavers–Joseph coefficient αBJ, which is supposed to contain the information on the surface roughness [49, 50]. An investigation to calibrate this parameter was recently car- ried out in [51], however, only for isotropic media. There was also an attempt to determine the Beavers–Joseph coef- ficient experimentally for flows parallel to the fluid–porous interface, isotropic and orthotropic porousmedia [67], where the Beavers–Joseph parameter was found to be αBJ < 1 and dependent on the intrinsic permeability. Finally, the perme- ability tensor appearing in the Beavers–Joseph condition Eq. 22 is not necessarily the permeability of the porous bulk, as in the standard models [53, 55], but could also be permeability of the near-interfacial region [54, 63]. 3.3.2 Generalized coupling conditions for arbitrary flows (REV-scale model) Analternative to the classical interface conditions for Stokes– Darcy problems are the generalized coupling conditions derived rigorously in [45] via homogenization and bound- ary layer theory. These conditions are valid for arbitrary flow directions to the fluid–porous interface and read vff ·n = vpm·n on �, (23) ppm = −n·T(vff , pff)n + μN bl s τ ·∇vffn on �, (24) vff ·τ = − N bl 1 τ ·∇vffn + μ−1 2 2∑ j=1 ∂ ppm ∂x j M j,bl·τ on �. (25) Here, M j,bl, N bl 1 and N bl s are boundary layer constants introduced in [45]. For the generalized coupling conditions the interface can be located at the distanceO( ) from the top of the first row of solid inclusions, where denotes the char- acteristic pore size. Based on the chosen interface position and the pore geometry, the effective coefficients appearing in conditions Eqs. 23–25 are computed numerically using the theory of homogenization and boundary layers [57–59, 61]. This is themain advantage of the generalized interface condi- tions, besides their suitability for arbitrary flows in coupled Stokes–Darcy systems. There are also other coupling con- cepts for Stokes–Darcy problems in the literature, e.g., [47, 48, 62, 66], which are beyond the scope of this study. 3.3.3 Coupling conditions for the pore-network model Each intersection of a pore body i with the free-flow domain boundary yields a pore-local discrete interface �i on which we formulate coupling conditions (Fig. 3). We assume no- flow/no-slip condition for the free flow at the location of solid grains (no intersecting pore throat). This results in the fol- lowing coupling conditions for the free-flow/pore-network model vff ·n = vpm·n on �i , (26) ppm = pff on �i , (27) vff ·τ = { vslip on �i , 0 else , (28) with vslip=− 1 βpore [ (∇v+∇vT )n·τ ] ff +[v·τ ]pm . (29) 123 670 Computational Geosciences (2023) 27:663–686 Fig. 3 Schematic representation of local interface for the free- flow/PNM We approximate the tangential component of the pore- body interface velocity as [v·τ ]pm = Qi j |�i | [ni j ·τ ]pm , (30) where ni j is a unit normal vector parallel to the throat’s cen- tral axis and pointing towards the interface �i . The volume flow through the pore throat i j is given by Qi j and |�i | is the area of the discrete coupling interface. Equations 28 and29 canbe seen as the pore-scale analog to Eq. 22 with a pore-local slip coefficient βpore, which is deter- mined numerically in a preprocessing step. We refer to [73] for more details. The three sets of coupling conditions Eqs. 20–22, 23–25 and 26–28 are discretized corresponding to the adjacent subdomain models’ discretizations, and the result- ing coupled discrete models are treated by a monolithic strategy, assembling all contributions in a single system of equations for each model. 4 Benchmark scenario Corresponding to Figs. 1 and 4, we investigate rectangular solid inclusions of size d and study a flow problemwhere the flow is arbitrary to the fluid–porous interface �. We consider laminar fluid flow through the coupled flow domain with viscosity μ = 10−3 Pa · s. We describe the geometrical configuration and the boundary conditions in Section 4.1, followed by a description of the uncertainties in Section 4.2 and the system response quantities in Section 4.3. Fig. 4 Schematic description of the coupled flow problem (top), unit cell and non-dimensional effective parameters for the interface location γ = 5.05mm (bottom) 4.1 Geometrical setting and boundary conditions We consider the free-flow region �ff = (0, L) × (γ, H) and the porous-medium domain �pm = (0, L)× (0, γ ) with L = 10.25mm and H = 6mm, separated by the sharp fluid–porous interface � = (0, L) × {γ }, where the value for γ is uncertain. The porous medium is isotropic, K = kI, and consists of 20 × 10 square solid inclusions of size d = 0.25mm (Fig. 4) leading to porosity φ = 0.75. The inclusions are positioned in such a way that the line tangent to the top of the upper row of solid inclusions is given by (0, L) × {5mm} and the characteristic pore size appearing in coupling condition Eq. 25 is = 0.5mm. For the classical interface conditions Eqs. 20–22 (Clas- sical IC) the Beavers–Joseph parameter is typically taken αBJ = 1 in the literature, although it is often not the optimal choice [51, 62, 67]. Here, we consider αBJ as an uncertain parameter, which is quantified in Sections 4.2 and 4.3. The boundary layer constants appearing in the generalized cou- pling conditions Eqs. 23–25 (Generalized IC) are computed numerically based on the geometrical configuration of the interfacial zone and are presented in Fig. 4 (bottom). For details on the computation of these effective parameters, we refer the reader to [45]. Note that the boundary layer constants N bl 1 and Mbl 1 (Fig. 4, bottom) are non-dimensional. For isotropic porous media, the constants M2,bl 1 = 0 and N bl s = 0, therefore, they do not appear in Fig. 4 (bottom). In order to obtain a closed formulation for the pore-scale problem Eqs. 14–15 we set the following boundary condi- 123 671 Computational Geosciences (2023) 27:663–686 Table 2 The uncertain parameters and their defined distributions for the classical coupled Stokes–Darcy model Parameter name Range Unit Distribution type Boundary velocity, V top [ 5 × 10−4, 1.5 × 10−3 ] m/s uniform Exact interface location, γ [4.9, 5.1] mm uniform Permeability, k [ 10−10, 10−8 ] m2 uniform Beavers–Joseph parameter, αBJ [0.1, 4] - uniform tions on the external boundary v = vin = (0, V top sin( 10003 πx)) on �in, (31) T(v, p)nff = 0 on �out, (32) v = 0 on �nf, (33) where the inflow boundary �in = (3mm, 6mm) × {H}, �out = {L} × (5.5mm, H), �nf = ∂� \ (�in ∪ �out) and nff is the outward unit normal vector on ∂�ff . The bound- ary conditions for the coupled flow problem are presented schematically in Fig. 4 (top). The corresponding boundary conditions for theREV-scale model formulation given by Eqs. 16–17 together with either the Classical IC 20–22 or the Generalized IC 23–25 read vff = (0, V top sin( 10003 πx)) on �in, (34) T(vff , pff)nff = 0 on �out, (35) vff = 0 on �nf,ff, (36) vpm·npm = 0 on �nf,pm, (37) where �nf,ff = ∂�ff \ (�in ∪ �out ∪ �), �nf,pm = ∂�pm \ � and npm denotes the unit normal vector on ∂�pm pointing outward the porous medium. The boundary conditions Eqs. 34–36 also hold for the hybrid-dimensional free-flow/pore-network model (Pore- Network model), such that no mass enters or leaves the domain through the pores on �nf,pm. The coupling condi- tions Eqs. 26–28 are set on � for PNM. 4.2 Uncertainties and errors So far, we have presented various coupling concepts for free flow and porous-medium flow. These concepts differ in the description of processes in the porous region and across the fluid–porous interface, where different mathematical mod- els are employed, whereas the free flow for all concepts is modeled based on the stationary Stokes equations. The uncertainty, due to the choice of adequate representation of the system of interest, is known as conceptual uncertainty. In addition to the conceptual uncertainty, each computa- tional model includes uncertain parameters, such as material parameters, or interface location, requiring a thorough inves- tigation. This type of uncertainty is known as parametric uncertainty. Uncertain model inputs, defined later, must be propagated through the model or simulation (also known as uncertainty propagation) to effectively assess compet- ing modeling concepts’ response quantities and validate the corresponding computational models against a reference solution. As described in Section 3.3.1, regarding the coupled Stokes–Darcy model with the Classical IC, the benchmark scenario contains four uncertain parameters: the maximum boundary velocity at the inflow boundary, the exact interface position, the Beavers–Joseph coefficient, and the permeabil- ity tensor. In contrast to the Classical IC, the Generalized IC do not contain the Beavers–Joseph coefficient. Further, the Generalized IC rely on the assumption that the interface location may not be below the top of the solid inclusions, hence, a narrower prior distribution for the interface location. Correspondingly, the parameters and their associated distri- butions as prior knowledge for the Stokes–Darcy model with theClassical IC and theGeneralized IC are listed in Tables 2 and 3, respectively. As for thePore-Networkmodel, we consider the total con- ductance gi j in Eq. 19 (see Fig. 2) as uncertain parameter to be inferred during the calibration phase. This parameter plays the role of permeability in the pore-network setting. Another uncertain input parameter is the pore-scale slip coefficient βpore. It can be determined numerically in a preprocessing step, in which it is approximated by solving a simplified, equivalent problem of free flow over a single pore throat intersecting with the lower boundary of the free-flow chan- nel [73]. The list of considered uncertain parameters and their Table 3 The uncertain parameters and their associated distributions for the Stokes–Darcy model with the generalized interface conditions Parameter name Range Unit Distribution type Boundary velocity, V top [ 5 × 10−4, 1.5 × 10−3 ] m/s uniform Exact interface location, γ [5.0, 5.1] mm uniform Permeability, k [ 10−10, 10−8 ] m2 uniform 123 672 Computational Geosciences (2023) 27:663–686 Table 4 The uncertain parameters and their specifications for the pore-network model Parameter name Range Unit Distribution type Boundary velocity, V top [ 5 × 10−4, 1.5 × 10−3 ] m/s uniform Total conductance, gi j [ 10−7, 10−5 ] m3/ (s · Pa) uniform Pore-local slip coefficient, βpore [103, 105] 1/m uniform associated distribution as prior knowledge for the PNM are presented in Table 4. As opposed to uncertainties, errors are defined as the difference between the true value and the predicted value, and have both a sign and a magnitude. We consider the errors associated with the model discrepancy error, numeri- cal approximation, and surrogate modeling in our analysis. These errors are aggregated and used as diagonal entries of the residual covariance matrix � in the likelihood function in Eq. 4. 4.2.1 Model discrepancy error In the current benchmark case study, we have a fully resolved pore-scale solution as a reference. However, in practice, the reference solution is not available, and instead, experimen- tal data is incorporated that includes an observation error. Nevertheless, the analyzed models in our benchmark study could never perfectly reproduce the ground truth, i.e., the reference solution. This difference can be attributed to the presence of model discrepancy. This discrepancy always exists for various reasons, such as simplified assumptions, missing physics, upscaling due to scale differences. Several methods have been proposed in the literature to incorporate the model discrepancy in a Bayesian setting. These meth- ods’ treatment of model discrepancy range from a constant bias to more sophisticated methods in which one forms a Bayesian hierarchical model to solve a joint parameter and model discrepancy inference problem [106–110]. Since the model discrepancy is not perfectly known, we parameterize it as �(θε) and treat its parameter θε as additional unknown parameter. Following [90], we infer these parameters jointly with model parameters θk in Eq. 4. We consider a diagonal covariance matrix as � = σ 2 INout with a scalar unknown parameter σ 2 for each system response quantity, i.e., veloc- ity and pressure (Section 4.3). Nout stands for the number of data points. 4.2.2 Numerical error The governing equations of the models under investigation in this study require approximation of numerical solutions. These approximations provide an additional source of error. There are five primary sources of errors in computational physics solutions, given that the numerical scheme is sta- ble, consistent, and robust. These sources are insufficient spatial discretization, insufficient temporal discretization for unsteady-state flow, insufficient iterative convergence, com- puter round-off, and computer programming [70]. Since quantifying errors from these sources is the main focus in the verification of numerical schemes, we only investigate the discretization error that originates from a certain choice of mesh size. Following [1], we take a heuristic approach to quantify this error, in that we fit generalized Richardson extrapolation to estimate the error by comparing three differ- ent mesh spacings. The Richardson extrapolation takes the following form fk = f̄ + eph p̂ k + O ( h p̂+1 k ) , (38) where fk denotes the exact solution to the discrete equation on a mesh with a known spacing hk , f̄ stands for the exact solution to the original PDE (unknown), ep is the error term coefficient, and p̂ indicates the observed order of accuracy. Here, we seek the first-order error. Thus, the unknowns f̄ and ep can be easily determined via the least square method for the numerical solutions obtained by varying the mesh spacing. 4.2.3 Surrogate error As previously mentioned, we substitute the computational models with the easy-to-evaluate surrogate models in the Bayesian analysis to offset the computational cost. This replacement also introduces a new source of error, known as a surrogate prediction error. Ignoring this error could result in a biased posterior distribution. As for prediction uncertainty, a mean squared error based on a testing set can provide a good estimate of the surrogate error variance. We incorporate the errors discussed above in the Markov Chain Monte Carlo (MCMC) simulation method to approxi- mate the posterior distribution [71, 72], used in the calibration stage.Wedirectly sumup all the covariancematrices of errors to obtain the likelihood calculations’ total covariance matrix. Here, we assume that all these errors follow a normal distri- bution and are independent of each other. 123 673 Computational Geosciences (2023) 27:663–686 4.3 System response quantities A fundamental ingredient of each benchmark is the def- inition of so-called system response quantities (SRQs). These quantities define the prescribed output from the ref- erence/experimental data as well as from the computational models that are compared in terms of the validation metric. The SRQs can be either local or global quantities. While the former can take quantities within the solution domain on the PDEs, such as dependent variables of the PDEs, the latter represents integral quantities or net flux out of a sys- tem. As part of model validation, we seek to compare system responses generated by different coupling concepts with the ones from the pore-scale resolvedmodel (Section 3.1). Fig. 5 shows the data extraction points for the velocity field (top) and the pressure (bottom). Since less variability is expected for pressure values, we have selected fewer extraction points for pressure responses. We train the surrogate models for all computational models based on the simulation results for the marked points. The points colored in blue and red provide the corresponding data for the calibration and validation steps, respectively. The pore-scale resolved simulation results contain both macroscopic and microscopic details of the flow field. The latter become visible as oscillations of the pore-scale solu- tions in the porous medium. To make numerical simulation results comparable we need to average them at the pore and REV scale. We consider volume averaging, where the Fig. 5 Data extraction points for the velocity field (top) and pressure (bottom) for the calibration and validation scenarios averaged velocity field at a given point x0 ∈ � is obtained as vavg(x0) = 1 |V (x0)| ∫ Vf(x0) v(x) dx , (39) where V (x0) is the representative elementary volume corre- sponding to x0 and Vf(x0) is its fluid part. The representative elementary volume V (x0) has the same size as the period- icity cell Y . Moreover, also the simulation results in the free-flow region need to be averaged correspondingly such that the interpretation of the SRQs is the same in all cases. 5 Application of the bayesian framework In this section, we compare the coupled models (using either REV-scale formulation or Pore-Network model in�pm) with the pore-scale resolvedmodel Eqs. 14–15. For the REV-scale model formulation, we consider the Stokes–Darcy problem with theClassical IC Eqs. 20–22 and theGeneralized IC Eqs. 23–25.We demonstrate that the latter ones are more accurate than the Classical IC in case of parallel flows to the inter- face and that they are suitable for arbitrary flow directions where theClassical IC fail. As shown in [68, 73], the hybrid- dimensional coupled model using a Pore-Network approach in�pm can be an efficient and accurate choice for simulating free flow over structured porous media. Here, our goal is to assess the coupled model’s accuracy compared to the above-mentionedREV-scale approaches and under the influence of pore-scale parameter uncertainty. As reference data, we use the fully resolved pore-scalemodel for the velocity and pressure (Fig. 6). However, it is worth men- tioning that the Stokes–Darcy model with Classical IC and Generalized IC can only offer predictions on the REV scale. Therefore, we average the values of SRQs obtained for the fully resolved pore-scale model as well as the Pore-Network model for consistency. The averaging is performed via a vol- ume averaging approach, discussed in Section 4.3 to make the REV-scale numerical simulation results comparable with that of the pore-scale resolved simulation. The pore-scale model and the coupled models have been implemented in the open-source simulatorDuMux [112]. The Bayesian analysis has been performed using a new open- source, object-oriented Python package BayesValidRox1. It provides an automated workflow for surrogate-based sen- sitivity analysis, Bayesian calibration, and validation of computational models with a modular structure. Replacing the models with their surrogates drastically reduces the total computational time of the analysis. This gain is essential in computationally demanding uncertainty 1 https://pypi.org/project/bayesvalidrox/ 123 674 https://pypi.org/project/bayesvalidrox/ Computational Geosciences (2023) 27:663–686 Fig. 6 Streamlines of the pore-scale velocity field (top) and the pressure (bottom) quantification tasks, such as propagation or inference. In this study, we observed that by using a well-trained surrogate model, we could speed up one simulation run from 10 ∼ 15 s to only 0.005 ∼ 0.007 s with acceptable accuracy. The current section offers insights into analysis of pre- dictive abilities in Section 5.2 and model comparison in Section 5.3 using the surrogate-based Bayesian validation framework introduced in Section 2. Additionally, we assess the influence of various modeling parameters onto the final model prediction, performing the global sensitivity analysis in Section 5.1. 5.1 Global sensitivity analysis In this section, we analyze how the variability of the model response quantities introduced in Section 4.3 at the selected data extraction points (Fig. 5) is affected by the variabil- ity of each input variable or combinations thereof. This is achieved via a sensitivity analysis. Various sensitivity anal- ysis approaches have been developed in recent years. For an extensive review of different techniques, we refer the reader to [101]. In the current paper, we explore the connection of polynomial representation to a global sensitivity measures [103] and use the so-called Sobol indices [102], derived from a variance decomposition of model outputs in terms of con- tributions of each input parameter or combinations thereof. Using the Sobol decomposition, one can describe the total variance of the model in terms of the sum of the summands’ variances. This variance decomposition is exten- sively explained in [104]. Leveraging the orthonormality of the polynomial chaos basis, the authors also derive the so- called PC-based Sobol indices. The idea behind these indices is as follows: once the PC representation of the model in Eq. 6 is available, the expansion coefficients cα are simply gath- ered according to the dependency of each basis polynomial, square-summed and normalized Si1,...,is = M∑ j=1 χ j c2j M∑ j=1 c2j , χ j = { 1, if αk j > 0, ∀ j ∈ (i1, . . . , is) 0, if αk j = 0, ∃ j ∈ (i1, . . . , is) } . (40) Here, Si1,...,is is the Sobol index that indicates what frac- tion of total variance of the response quantity can be traced back to the joint contributions of the parameters θi1 , . . . , θis . The index selection operator χ j indicates where the chosen parameters θ numbered as i1, . . . , is (i.e., θi1, . . . , θis ) have concurrent contributions to the variance within the overall expansion. Simply put, it selects all polynomial terms with the specified combination i1, . . . , is of model parameters. A complementing measure for sensitivity analysis is the Sobol Total Index. It expresses the total contribution to the variance of model output due to the uncertainty of an indi- vidual parameter θ j in all cross-combinations with other parameters STj = ∑ {i1,...,is }⊃ j Si1,...,is , (41) where STj is a summation of all Sobol indices in which the variable θ j appears as univariate as well as joint influences. The total Sobol index can take values larger than 1 when the impact of interactions among parameters on the total output variance is not negligible. This characteristic is particularly prominent in highly nonlinear problems like the ones inves- tigated in this study. In what fallows, we present the total Sobol indices for the SQRs and the data extraction points defined in Section 4, for all three models for the calibration scenario. 5.1.1 Classical IC In Fig. 7,we provide the total Sobol indices for the calibration points in blue (Fig. 5) for velocity and pressure for the Clas- sical IC Eqs. 20–22. We observe that the boundary velocity V top has themost contribution to the velocity variance (Fig. 7, left) in the free-flow region and pressure field (Fig. 7, right) for the analyzed points in the domain. Moreover, the exact interface location plays an important role for the velocity field, especially near the interface (Fig. 7, left) and influ- ences the pressure field as well (Fig. 7, right). The value of the Beavers–Joseph parameter has a higher impact on the 123 675 Computational Geosciences (2023) 27:663–686 Fig. 7 Total Sobol indices of the Stokes–Darcy model with the Classical IC for the velocity (left) and pressure (right) for the calibration (blue) points in Fig. 5 velocity near the interface than other parts of the domain. However, this parameter does not play an essential role for the pressure. The permeability k significantly affects the veloc- ity in the porous-medium domain, whereas its influence on the velocity in the free-flow region and the pressure field is small. 5.1.2 Generalized IC In Fig. 8, the total Sobol indices for velocity and pressure before calibration are presented for the selected blue points in Fig. 5. For the Generalized IC, the information about the exact interface location γ is included in the boundary layer constants N bl 1 and M1,bl 1 appearing in condition Eq. 25. Therefore, the exact position of the interface does not influence the overall system behavior in comparison to the Classical IC. The permeability k (in the porous-medium) and the inflow velocity V top have significant impact both on the velocity (in the free-flow region) and the pressure field, as in the case of the Classical IC. 5.1.3 Pore-network model Figure 9 shows the total Sobol indices for velocity (left) and pressure (right) for the blue points in Fig. 5. As for the REV-scale coupled models, we observe a dominant influ- ence of V top for all points. As expected, the influence of the total conductance is more prominent in the porous domain, which is comparable to the influence of permeability for the REV-scale coupled models. The influence of the pore-scale slip parameter βpore shows a relatively small influence on the variability of velocity at point 6 and is hardly visible at other locations in the free-flow region. This matter is most likely because the slip coefficient only affects the flow field in the free-flow domain �ff very locally, directly above the interface pore. The averaging volume used for the evalu- ation, however, takes into account a larger portion of the free-flow region, where the influence of βpore is fairly small. In analogy to the REV models, V top also has a dominat- ing influence on the pressure and velocity in the free-flow region. Fig. 8 Total Sobol indices of the Stokes–Darcy model with the Generalized IC for the velocity (left) and pressure (right) for the calibration (blue) points in Fig. 5 123 676 Computational Geosciences (2023) 27:663–686 Fig. 9 Total Sobol indices of the Pore-Network model for the velocity (left) and pressure (right) for the calibration (blue) points in Fig. 5 5.2 Analysis of predictive abilities In this section, we present the result of the analysis of the predictive ability of all three discussed conceptual models by showing their parametric posterior and the corresponding predictive distributions. These results are generated via the surrogate-based Bayesian procedure described in Section 2. In the calibra- tion phase, we update the prior knowledge on the uncertain model parameters according to Section 4.2.We condition the responses of all analyzed models on the velocity and pres- sure values extracted from the pore-scale simulations that are marked as a blue points in Fig. 5. Todo so,we employMCMC approach via emcee Python ensemble sampling toolkit [99] to perform Bayesian inference in Section 2.1 using BsaPCE surrogate representation in Section 2.2. We use an Affine Invariant Ensemble Sampler (AIES) to approximate the pos- terior distribution. For more details on MCMC, we refer the reader to [98]. To accelerate this Bayesian updating step, we train each surrogate model with the simulation outcomes of 300 runs of each numerical model. The AIES-MCMC sampler is run for an ensemble of 50 Markov chains on each surrogate. We monitor the convergence of the sampler using the inte- grated auto-correlation time, which estimates the number of evaluations of the posterior probability density function to draw independent samples from the target density [97]. The MCMC sampler is run until the convergence criterion of 1% for the difference in the auto-correlation time between two consequent monitoring steps is met. We retrain a new set of surrogate models in the validation stage based on the updated parameter distribution (posterior distribution) obtained after calibration. With these surrogate models, we propagate the posterior parametric uncertainty to estimate the posterior pre- dictive distribution of models to be passed to the Bayesian metric calculation step. As discussed in Section 4.2, using surrogates may intro- duce additional errors to the inference process. To include this error, we test the surrogate models with 150 simula- tion runs (test sets) which are different from the training sets. Comparing the surrogates’ prediction with the results from the test sets, we observed a considerably low valida- tion error between 10−8 and 10−11 for all models, indicating an acceptable prediction accuracy. Moreover, we estimated Mean Square Error (MSE) for each surrogate model that is a good estimate of the surrogate error variance [100]. When evaluating the likelihood p(Y|Mk, θk) in Eq. 4, we add a diagonal matrix �PCE with elements σ 2 PCE,i = MSEi , i = 1, 2, ..., Nout to �, assuming that the surrogate errors are independent and followanormal distributionwith zeromean. Moreover, following [21], we perturb the reference data with some additive noise to account for uncertainty associated with the BME values, the resulting Bayes factors, and pos- terior model weights. With this approach, we investigate the impact of other possible sources of errors on the validation metrics that are not considered in the calculations. As mentioned in Section 4.2, we jointly infer the uncer- tain parameters with the scalar unknown parameters σ 2 vel and σ 2 p for each system response quantity, i.e., velocity and pressure. These parameters are assummed to be uncorellated in this sutdy as defining the correct correlation representa- tion is difficult given the nonlinearities near the interface and the volume averaging applied to obtain the response quantities. For these parameters, we assume uniform distri- butions σ 2 vel ∼ U [ 0, 10−5 ] and σ 2 p ∼ U [ 0, 10−3 ] as priors, for velocity and pressure, respectively. In what follows, we present the updated (posterior) distribution of the parameters and model discrepancy errors after calibration obtained by the MCMC sampler for all three models. Afterward, a figure containing the posterior predictive of models next to each other versus the reference data is provided. 5.2.1 Classical IC Figure 10 presents the posterior distribution obtained via the Bayesian inference using the calibration (blue) points in 123 677 Computational Geosciences (2023) 27:663–686 Fig. 10 Posterior parameter distribution of the Stokes–Darcy model with the Classical IC after calibration to the reference data from the pore-scale model Fig. 5. The 50 percent quantiles, alongside the 15 and 85 percent quantiles, are displayed on top of the histograms shown in the diagonal plots. Most posterior distributions of the parameters follow a Gaussian distribution. How- ever, the distribution of the interface location γ and the Beavers–Joseph slip coefficient αBJ exhibits a long right tail. Moreover, a slight correlation between γ andαBJ is observed. 5.2.2 Generalized IC Similar to the procedure described above, the surrogate- based Bayesian calibration offers insight into the posterior distributions of modeling parameters for the Stokes–Darcy model with the Generalized IC (Fig. 11). As opposed to the Classical IC, the interface location γ for this coupling condition shows a slightly wider distribution. This observa- tion indicates that the exact position of interface does not influence the overall system behavior in comparison to the Classical IC. 5.2.3 Pore-network model For the Pore-Network model, we also have used the cal- ibration (blue) points (Fig. 5) to perform surrogate-based Bayesian inference. Figure 12 illustrates the posterior param- eter distribution of thePore-Networkmodel. The distribution of βpore covers a wider range. This issue can be attributed to insensitivity of the model results to this parameter, as pre- sented by the total Sobol indices in Fig. 9. To obtain the models’ posterior predictive distributions, we need to propagate the posterior parametric uncertainty presented so far through the models. The result offers a pos- sibility of analyzing how post-calibration uncertainty affects the SRQs. To perform the post-calibration uncertainty prop- agation, we have trained a new surrogate for each competing model using new training sample points drawn from the pos- terior parameter distribution. For better visual comparison, we plot the posterior predictive of models next to each other. Figures 13 and 14 illustrate the mean and standard devia- 123 678 Computational Geosciences (2023) 27:663–686 Fig. 11 Posterior parameter distribution of the Stokes–Darcy model with the Generalized IC after calibration to the reference data from the pore-scale model Fig. 12 Posterior parameter distribution of the Pore-Network model after calibration to the reference data from the pore-scale model 123 679 Computational Geosciences (2023) 27:663–686 Fig. 13 The velocity predictions of all models in the validation step against the reference data from the pore-scale model tions of the model predictive distributions in a bar chart for the velocity and pressure response quantities, respectively. In particular, Fig. 13 reveals that all analyzed models pro- vide accurate predictions at the points located in the deeper part of the porous medium (1 to 4). However, the predic- tions at the points near the interface (5 to 8) suggest that the Stokes–Darcy model with Classical IC and Generalized IC provide more accurate predictions than the Pore-Network model. TheREV-scalemodelwithGeneralized IC shows less uncertainty, i.e., lower standard deviation, in its prediction at the vicinity of the interface between the porous medium and the free-flow. Moreover, Fig. 14 confirms that all models are able to provide accurate pressure values. 5.3 Model comparison We performmodel comparison employing the so-called pos- terior model weights according to the Bayesian approach Fig. 14 The pressure predictions of all models in the validation step against the reference data from the pore-scale model explained in Section 2.1. Such an analysis offers an aggre- gated comparison of a model’s outputs to the validation set of reference data from the pore-scale model that are marked in red in Fig. 5. For model comparison analysis, we use the newly constructed surrogate representation during the valida- tion stage to compute the BME values in Eq. 3. These values are required to calculate the posterior model weights Eq. 1 and the Bayes factors in Eq. 5. Additionally, use of the advanced surrogate representation provides a possibility to assess uncertainty of the BME values and the corresponding model weights. Table 5 presents a detailed statistical sum- mary of the model weights and provides a ranking. It also reports the information regarding the post-calibration uncer- tainty with help of the deviation regarding 25% and 75% percentiles. The expected model weights under noisy pore-scale data assumption convey a relatively clear model ranking in favor ofGeneralized IC, withClassical IC as second and the Pore- Network model ranking last. It is worth mentioning that the model weights close to zero for the Classical IC and Pore-Network model can be attributed to the high predic- tion uncertainty of these models. This fact is represented by the error bars in Fig. 13. Moreover, a considerable mismatch can be detected between the expected velocity prediction of the Pore-Network model and the reference data at validation Table 5 The statistical summary of posterior model weights after val- idation Model Model weights Rank Classical IC 0.003+0.002 −0.001 2 Generalized IC 0.997+0.001 −0.002 1 Pore-network 0.000+0.000 −0.000 3 123 680 Computational Geosciences (2023) 27:663–686 points 5 and 6. As for the Classical IC, the velocity predic- tion uncertainty is higher than that of the Generalized IC. This difference is mainly for the points at the vicinity of the interface and in the free-flow region. Assessments of confidence in model ranking have been investigated by means of Bayes Factor Eq. 5 for pairwise comparison of models based on the validation scenario. In the introduced uncertainty-aware Bayesian validation framework, Bayes Factors provide an objective measure of significance that quantifies the evidence in favor of one model’s superiority against another. Figure 15 presents the probability density functions of log10(BF) over all perturbed velocity and pressure data sets in a three-by-three matrix. Here, we compute three Bayes Factors for each model against its counterpart. The significant levels in a log10- scale, introduced in [88] are marked with the vertical lines. Gray lines represent equally strong evidence for both mod- els. Orange and red lines indicate thresholds for strong and decisive evidence in favor of one model against the other, respectively. The first plot in the second row in Fig. 15, e.g., shows the distribution of log10(BF) in favor of Generalized IC against Classical IC. This plot reveals that for most of the perturbed data sets, the Bayes factor is in the region where a decisive evidence (log10(BF) greater than two) exists in favor ofGen- eralized IC to outperform Classical IC. Similarly, in all the analyzed cases (perturbed data sets), Classical IC could be clearly favored against Pore-Network model based on the decisive evidence (the plot in the first row, the last column). Moreover, the distribution in the second row, third column of Fig. 15 reveals that the Bayes factor distribution of Gen- eralized IC against Pore-Network model proves a decisive evidence in favor of Generalized IC. The results presented so far are based on a compari- son of the SRQs with averaged SRQs of the fully resolved Stokes simulation, as the Stokes–Darcy model with Clas- sical IC and Generalized IC could offer a prediction on the REV scale only. However, one could directly compare the Pore-Network model to the reference data at the pore scale without performing volume averaging by calculating the surface-averaged pore-scale velocity at the pore-throat cross-sections. We denote the Pore-Network model with pore-throat surface averaging model as the Pore-Network SA model and its Bayes factors distribution is shown in Fig. 16. The velocities of Pore-Network SA model are not defined within the pore bodies but only at the pore throats, which explains why the results of Fig. 16 show stronger evidence in favor of the Pore-Network SA model compared to the other concepts. Therefore, thePore-Network SAmodel avoid- ing additional averaging steps is a suitable approach when detailed pore-scale information is considered. Alternatively, the Stokes–Darcy model with Generalized IC adequately Fig. 15 Distributions of log10 (Bayes Factor) for the pairwise comparison of competing models based on the validation scenario 123 681 Computational Geosciences (2023) 27:663–686 Fig. 16 Distributions of log10 (Bayes factor) of the Pore-Network model, with the surface averaging against competing models based on the validation scenario represents the underlying physical processes once the REV- scale information is available only. In addition to the setup presented in Section 4.1, we also analyzed two other cases. Firstly, we considered a setup with the same geometrical configuration, but the inlet boundary was located at the left domain edge in the free-flow region with an opening of 1.5 mm from the top. This setup induces a flow profile parallel to the interface. Comparing Classi- cal IC with Generalized IC, we witnessed no substantial evidence in favor of any model. This observation is in line with the results from [45], where the authors showed that the Stokes–Darcy problem with Classical IC and Generalized IC provides similar simulation results for parallel flows to the porous layer. The second additional setup is based on the same flow models and boundary conditions as presented in Section 4.1, however, the solid inclusions are circular. We compared the Stokes–Darcy model with Classical IC and Generalized IC against the reference data. The model com- parison with Bayes factor suggests strong evidence in favor of the Generalized IC, as expected and similar to the rectan- gular inclusions. 6 Summary and conclusions We have proposed a surrogate-assisted uncertainty-aware Bayesian validation framework and applied it to a benchmark study that addresses not only these parametric uncertainties, but also conceptual modeling uncertainties due to different formulations of physical models. To do so, we have considered the Stokes equations cou- pled to differentmodels for the porous-mediumcompartment and corresponding coupling strategies: the standard REV- scale model using Darcy’s law with classical or generalized interface conditions as well as the pore-network model. The advantage of employing a surrogate modeling technique is that one can perform a sensitivity analysis without addi- tional costs. This analysis is achieved using the so-called Sobol indices that are derived analytically from the expansion coefficients. The application of the presented surrogate- assisted Bayesian uncertainty-aware framework is not lim- ited to the models considered in this manuscript, but can be applied to many other applications. Applying the suggested Bayesian validation framework, we have observed that there are matches between the pre- dictions related to the considered models and the reference data for the points in the deeper part of the porous medium for all coupled models. However, we have found differences in the predictive capabilities of the models in the vicinity of the interface and in the free-flow region. Moreover, we have propagated the post-calibration parametric uncertainty through each analyzed model to validate the different mod- els against reference data that have not been used during the calibration phase. This uncertainty-aware Bayesian valida- tion procedure has confirmed that the averaged pore-network model has the most difficulties representing the underlying physical process correctly. This issue is most likely due to the averaging approach used for the pore-network model, where velocities have to be calculated and interpolated from fluxes that are only given within pore throats. Moreover, addressing the differences in the predictions of the considered modeling concepts, we have performed a Bayesian model comparison. This comparison reveals that the Stokes–Darcy model with the generalized interface conditions represents processes on the REV scale best compared to the classical interface conditions and the correspondingly upscaled pore- network model. The pore-network model outperforms both Stokes–Darcy models with classical and generalized inter- face conditions only if the detailed pore-scale information is considered. We have also investigated two other cases: one with the opening boundary condition on the left side and another with circular inclusions in the porous medium. The analysis of the former setting, which induces parallel flow to the interface, reveals that the Stokes–Darcy models with the classical and the generalized interface conditions provide similar results. This observation was expected for flows parallel to the fluid–porous interface. The findings of the analysis for the setupwith circular inclusions confirm that 123 682 Computational Geosciences (2023) 27:663–686 there is decisive evidence in favor of the generalized interface condition being superior to the classical interface. Conclud- ing, we have observed that the suggested surrogate-assisted uncertainty-aware Bayesian validation framework helps to gain insight into underlying physical processes at consider- ably low computational costs. Acknowledgements The work is funded by the Deutsche Forschungs- gemeinschaft (DFG, German Research Foundation) – Project Number 327154368 – SFB 1313. Funding Open Access funding enabled and organized by Projekt DEAL. Data Availability The Bayesian framework and the models’ source codes, as well as the reference data used in this study, are available at https://git.iws.uni-stuttgart.de/dumux-pub/mohammadi2022a. Declarations Competing interests The authors have no competing interests to declare that are relevant to the content of this article. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adap- tation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indi- cate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, youwill need to obtain permission directly from the copy- right holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/. References 1. Oberkampf, W.L., Roy, C.J.: Verification and Validation in Scien- tific Computing. Cambridge University Press, Cambridge (2010) 2. Oberkampf, W.L., Trucano, T.G.: Verification and validation benchmarks. Nucl. Eng. Des. 238, 716–743 (2008) 3. Oberkampf, W.L., Barone, M.F.: Measures of agreement between computation and experiment: validationmetrics. J. Comput. Phys. 217(1), 5–36 (2006) 4. Ferson, S., Oberkampf, W.L., Ginzburg, L.: Model validation and predictive capability for the thermal challenge problem. Comput. Methods Appl. Mech. Eng. 197(29–32), 2408–2430 (2008) 5. Sankararaman, S., Ling, Y., Shantz, C., Mahadevan, S.: Inference of equivalent initial flaw size under multiple sources of uncer- tainty. Int. J. Fatigue 33(2), 75–89 (2011) 6. Hills, R.G., Leslie, I.H.: Statistical validation of engineering and scientificmodels: validation experiments to application.Technical report, Sandia National Lab, Albuquerque, NM (US) (2003) 7. Urbina, A., Paez, T.L., Urbina, A., Hasselman, T., Wathugala, W., Yap, K.: Assessment of model accuracy relative to stochastic sys- tem behavior. In: 44 Th AIAA/ASME/ASCE/AHS/ASC Struc- tures, Structural Dynamics, and Materials Conference (2003) 8. Gelfand, A.E., Dey, D.K.: Bayesian model choice: asymptotics and exact calculations. J. R. Stat. Soc. B 56(3), 501–514 (1994) 9. Zhang, R., Mahadevan, S.: Bayesian methodology for reliability model acceptance. Reliab. Eng. Syst. Saf. 80(1), 95–103 (2003) 10. Mahadevan, S., Rebba, R.: Validation of reliability computational models using bayes networks. Reliab. Eng. Syst. Saf. 87(2), 223– 232 (2005) 11. Geweke, J.: Bayesian model comparison and validation. Am. Econ. Rev. 97(2), 60–64 (2007) 12. Sankararaman, S., Mahadevan, S.: Model validation under epis- temic uncertainty. Reliab. Eng. Syst. Saf. 96(9), 1232–1241 (2011) 13. Rebba, R., Mahadevan, S.: Computational methods for model reliability assessment. Reliab. Eng. Syst. Saf. 93(8), 1197–1207 (2008) 14. Sankararaman, S., Mahadevan, S.: Assessing the reliabil- ity of computational models under uncertainty. In: 54th AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynam- ics, and Materials Conference, pp. 1873 (2013) 15. Thacker, B.H., Paez, T.L.: A simple probabilistic validation met- ric for the comparison of uncertain model and test results. In: 16th AIAANon-Deterministic Approaches Conference, pp. 0121 (2014) 16. Liu,Y., Chen,W.,Arendt, P., Huang,H.-Z.: Toward a better under- standing of model validationmetrics. J. Mech. Des. 133(7) (2011) 17. Ling, Y., Mahadevan, S.: Quantitative model validation tech- niques: New insights. Reliab. Eng. Syst. Saf. 111, 217–231 (2013) 18. Hills, R.G.: Model validation: model parameter and measurement uncertainty. J. Heat Transfer 128(4), 339–351 (2006) 19. Geman, S., Bienenstock, E., Doursat, R.: Neural networks and the bias/variance dilemma. Neural Comput. 4(1), 1–58 (1992) 20. Schöniger, A., Wöhling, T., Samaniego, L., Nowak, W.: Model selection on solid ground: rigorous comparison of nine ways to evaluate Bayesian model evidence. Water Resour. Res. 50(12), 9484–9513 (2014) 21. Schöniger, A., Wöhling, T., Nowak, W.: A statistical concept to assess the uncertainty in Bayesian model weights and its impact on model ranking. Water Resour. Res. 51(9), 7524–7546 (2015) 22. Draper, D.: Assessment and propagation of model uncertainty. J. R. Stat. Soc. Ser. B. Stat. Methodol. 57(1), 45–70 (1995) 23. Hoeting, J.A., Madigan, D., Raftery, A.E., Volinsky, C.T.: Bayesian model averaging: a tutorial. Statist. Sci. 14(4), 382–417 (1999) 24. Trotta, R.: Bayes in the sky: Bayesian inference and model selec- tion in cosmology. Contemp. Phys. 49(2), 71–104 (2008) 25. Faust, J., Gilchrist, S., Wright, J.H., Zakrajšsek, E.: Credit spreads as predictors of real-time economic activity: a Bayesian model-averaging approach. Rev. Econ. Stat. 95(5), 1501–1519 (2013) 26. Hooten, M.B., Hobbs, N.T.: A guide to Bayesian model selection for ecologists. Ecol. Monogr. 85(1), 3–28 (2015) 27. Hoege, M., Guthke, A., Nowak, W.: The hydrologist’s guide to bayesian model selection, averaging and combination. J. Hydrol. 572, 96–107 (2019) 28. Schöniger, A., Illman, W.A., Wöhling, T., Nowak, W.: Finding the right balance between groundwater model complexity and experimental effort via Bayesian model selection. J. Hydrol. 531, 96–110 (2015) 29. Brunetti, C., Linde, N., Vrugt, J.A.: Bayesian model selection in hydrogeophysics: Application to conceptual subsurfacemodels of the south oyster bacterial transport site, virginia, usa. Adv. Water Resour. 102, 127–141 (2017) 30. Wöhling, T., Schöniger, A., Gayler, S., Nowak, W.: Bayesian model averaging to explore the worth of data for soilplant model selection and prediction. Water Resour. Res. 51(4), 2825–2846 (2015) 31. Schäfer Rodrigues Silva, A., Guthke, A., Höge, M., Cirpka, O.A., Nowak, W.: Strategies for simplifying reactive transport models: 123 683 https://git.iws.uni-stuttgart.de/dumux-pub/mohammadi2022a. http://creativecommons.org/licenses/by/4.0/ http://creativecommons.org/licenses/by/4.0/ Computational Geosciences (2023) 27:663–686 A bayesian model comparison. Water Resour. Res. 56(11), 2020– 028100 (2020) 32. Yoon, H.N., Marshall, L., Sharma, A., Kim, S.: Bayesian model calibration using surrogate streamflow in ungauged catchments. Water Resour. Res. 58(1), 2021–031287 (2022) 33. Mohammadi, F., Kopmann, R., Guthke, A., Oladyshkin, S., Nowak,W.: Bayesian selection of hydro-morphodynamic models under computational time constraints. Adv. Water Resour. 117, 53–64 (2018) 34. Beckers, F., Heredia, A., Noack, M., Nowak, W., Wieprecht, S., Oladyshkin, S.: Bayesian calibration and validation of a large-scale and time-demanding sediment transport model. Water Resour. Res. 56(7), 2019–026966 (2020) 35. Bazargan, H., Christie, M., Elsheikh, A.H., Ahmadi, M.: Sur- rogate accelerated sampling of reservoir models with complex structures using sparse polynomial chaos expansion. Adv. Water Res. 86, 385–399 (2015) 36. Bazargan, H., Christie, M.: Bayesian model selection for com- plex geological structures using polynomial chaos proxy.Comput. Geosci. 21(3), 533–551 (2017) 37. Scheurer, S., Schäfer Rodrigues Silva, A., Mohammadi, F., Hommel, J., Oladyshkin, S., Flemisch, B., Nowak, W.: Surro- gatebased Bayesian comparison of computationally expensive models: application to microbially induced calcite precipitation. Comput. Geosci. 25(6), 1899–1917 (2021) 38. Elsheikh,A.H., Hoteit, I.,Wheeler,M.F.: Efficient Bayesian infer- ence of subsurface flow models using nested sampling and sparse polynomial chaos surrogates. Comput. Methods Appl. Mech. Engrg. 269, 515–537 (2014) 39. Wiener, N.: The homogeneous chaos. Amer. J. Math. 60(4), 897– 936 (1938) 40. Oladyshkin, S., Nowak, W.: Data-driven uncertainty quantifica- tion using the arbitrary polynomial chaos expansion. Reliab. Eng. Syst. Saf. 106, 179–190 (2012) 41. Tipping, M.E.: Sparse Bayesian learning and the relevance vector machine. J. Mach. Learn. Res. 1(3), 211–244 (2001) 42. Arjoune, Y., Kaabouch, N., El Ghazi, H., Tamtaoui, A.: Com- pressive sensing: performance comparison of sparse recovery algorithms. In: 2017 IEEE 7th Annual Computing and Communi- cationWorkshop and Conference (CCWC), pp. 1–7 (2017). IEEE 43. Lüthen, N., Marelli, S., Sudret, B.: Sparse polynomial chaos expansions: literature survey and benchmark. SIAM/ASA J. Uncertainty Quantif. 9(2), 593–649 (2021) 44. Tipping, M.E., Faul, A.: Fast marginal likelihood maximisation for sparse Bayesian models. In: Proceedings of the 9th Interna- tional Workshop on Artificial Intelligence and Statistics, pp. 3–6 (2003) 45. Eggenweiler, E., Rybak, I.: Effective coupling conditions for arbi- trary flows in Stokes-Darcy systems. Multiscale Model. Simul. 19(2), 731–757 (2021) 46. Goyeau, B., Lhuillier, D., Gobin, D., Velarde, M.: Momentum transport at a fluid-porous interface. Int. J. Heat Mass Tran. 46, 4071–4081 (2003) 47. Angot, P., Goyeau, B., Ochoa-Tapia, J.A.: Asymptotic model- ing of transport phenomena at the interface between a fluid and a porous layer: jump conditions. Phys. Rev. E 95, 063302 (2017) 48. Ochoa-Tapia, A.J., Whitaker, S.: Momentum transfer at the boundary between a porous medium and a homogeneous fluid. I: theoretical development. Int. J. Heat Mass Tran. 38, 2635–2646 (1995) 49. Le Bars, M., Worster, M.: Interfacial conditions between a pure fluid and a porous medium: implications for binary alloy solidifi- cation. J. Fluid Mech. 550, 149–173 (2006) 50. Beavers, G.S., Joseph, D.D.: Boundary conditions at a naturally permeable wall. J. Fluid Mech. 30, 197–207 (1967) 51. Rybak, I., Schwarzmeier, C., Eggenweiler, E., Rüde, U.: Vali- dation and calibration of coupled porous-medium and free-flow problems using pore-scale resolved models. Comput. Geosci. 25(2), 621–635 (2021) 52. Eggenweiler, E., Rybak, I.: Unsuitability of the Beavers-Joseph interface condition for filtration problems. J. Fluid Mech. 892, 10 (2020) 53. Discacciati, M., Miglio, E., Quarteroni, A.: Mathematical and numerical models for coupling surface and groundwater flows. Appl. Num. Math. 43, 57–74 (2002) 54. Zampogna, G.A., Bottaro, A.: Fluid flow over and through a reg- ular bundle of rigid fibres. J. Fluid Mech. 792, 5–35 (2016) 55. Discacciati, M., Quarteroni, A.: Navier-Stokes/Darcy coupling: modeling, analysis, and numerical approximation. Rev. Mat. Complut. 22, 315–426 (2009) 56. Saffman, P.G.: On the boundary condition at the surface of a porous medium. Stud. Appl. Math. 50, 93–101 (1971) 57. Hornung, U.: Homogenization and Porous Media. Springer, New York (1996) 58. Jäger, W., Mikelić, A.: Modeling effective interface laws for transport phenomena between an unconfined fluid and a porous mediumusing homogenization. Transp. PorousMed. 78, 489–508 (2009) 59. Jäger, W., Mikelić, A.: On the interface boundary conditions by Beavers, Joseph and Saffman. SIAM J. Appl. Math. 60, 1111– 1127 (2000) 60. Yang, G., Coltman, E., Weishaupt, K., Terzis, A., Helmig, R., Weigand, B.: On the Beavers–Joseph interface condition for non- parallel coupled channel flow over a porous structure at high Reynolds numbers. Transp. Porous Med. 128, 431–457 (2019) 61. Carraro, T., Goll, C., Marciniak-Czochra, A., Mikelić, A.: Effec- tive interface conditions for the forced infiltration of a viscousfluid into a porous medium using homogenization. Comput. Methods Appl. Mech. Engrg. 292, 195–220 (2015) 62. Lācis, U., Sudhakar, Y., Pasche, S., Bagheri, S.: Transfer of mass and momentum at rough and porous surfaces. J. Fluid Mech. 884, 21 (2020) 63. Lācis, U., Bagheri, S.: A framework for computing effective boundary conditions at the interface between free fluid and a porous medium. J. Fluid Mech. 812, 866–889 (2017) 64. Hanspal, N., Waghode, A., Nassehi, V., Wakeman, R.: Devel- opment of a predictive mathematical model for coupled Stokes/Darcyflows in cross-flowmembranefiltration.Chem.Eng. J. 149, 132–142 (2009) 65. Discacciati,M.,Gerardo-Giorda, L.:OptimizedSchwarzmethods for the Stokes-Darcy coupling. IMA J. Numer. Anal. 38, 1959– 1983 (2018) 66. Angot, P., Goyeau, B., Ochoa-Tapia, J.A.: A nonlinear asymptotic model for the inertial flow at a fluid-porous interface. Adv. Water Res. 149, 103798 (2021) 67. Mierzwiczak, M., Fraska, A., Grabski, J.K.: Determination of the slip constant in the Beavers–Joseph experiment for laminar fluid flow through porousmedia using ameshlessmethod.Math. Probl. Eng. 2019, 1494215 (2019) 68. Weishaupt, K., Joekar-Niasar, V., Helmig, R.: An efficient cou- pling of free flow and porous media flow using the pore-network modeling approach. J. Comput. Phys. X 1, 100011 (2019) 69. Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.B., Vehtari, A., Rubin, D.B.: Bayesian Data Analysis. CRC Press, London (2013) 70. Roy,C.J.: Errors and uncertainties: their sources and treatment. In: Computer Simulation Validation, pp. 119–141. Springer, Cham (2019) 71. Robert, C., Casella, G.: Monte Carlo Statistical Methods. Springer, New York (1999) 72. Liu, J.S.: Monte Carlo Strategies in Scientific Computing. Springer, New York (2008) 123 684 Computational Geosciences (2023) 27:663–686 73. Weishaupt, K., Terzis, A., Zarikos, I., Yang, G., Flemisch, B., de Winter, D.A.M., Helmig, R.: A hybrid-dimensional coupled pore-network/free-flow model including pore-scale slip and its application to a micromodel experiment. Transp. Porous Med. 135(1), 243–270 (2020) 74. Koch, T., Weishaupt, K., Müller, J., Weigand, B., Helmig, R.: A (dual) network model for heat transfer in porous media. Transp. Porous Med. 140(1), 107–141 (2021) 75. Mehmani, Y., Tchelepi, H.A.: Minimum requirements for predic- tive pore-network modeling of solute transport in micromodels. Adv. Water Res. 108, 83–98 (2017) 76. Terzis, A., Zarikos, I., Weishaupt, K., Yang, G., Chu, X., Helmig, R., Weigand, B.: Microscopic velocity field measurements inside a regular porousmediumadjacent to a lowReynolds number chan- nel flow. Phys. Fluids 31, 042001 (2019) 77. Patzek, T.W., Silin, D.B.: Shape factor and hydraulic conductance in noncircular capillaries: I. One-phase creeping flow. J. Colloid Interface Sci. 236, 295–304 (2001) 78. Oostrom, M., Mehmani, Y., Romero-Gomez, P., Tang, Y., Liu, H., Yoon, H., Kang, Q., Joekar-Niasar, V., Balhoff, M., Dewers, T., Zhang, C.: Pore-scale and continuum simulations of solute transport micromodel benchmark experiments. Comput. Geosci. 20(4), 857–879 (2016) 79. Scheibe, T.D., Murphy, E.M., Chen, X., Rice, A.K., Carroll, K.C., Palmer, B.J., Tartakovsky, A.M., Battiato, I., Wood, B.D.: An analysis platform for multiscale hydrogeologic modeling with emphasis on hybrid multiscale methods. Groundwater 53(1), 38– 56 (2015) 80. Balhoff, M.T., Thompson, K.E., Hjortso,M.: Coupling pore-scale networks to continuum-scale models of porous media. Comput. Geosci. 33(3), 393–410 (2007) 81. Balhoff,M.T., Thomas, S.G.,Wheeler,M.F.:Mortar coupling and upscaling of pore-scale models. Comput. Geosci. 12(1), 15–27 (2007) 82. Mehmani, Y., Balhoff, M.T.: Bridging from pore to continuum: a hybrid mortar domain decomposition framework for subsur- face flow and transport. MultiscaleModel. Simul. 12(2), 667–693 (2014) 83. Beyhaghi, S., Xu, Z., Pillai, K.M.: Achieving the inside–outside coupling during network simulation of isothermal drying of a porous medium in a turbulent flow. Transp. Porous Med. 114(3), 823–842 (2016) 84. Blunt, M.J.: Multiphase Flow in Permeable Media: a Pore-scale Perspective. Cambridge University Press, Cambridge (2017) 85. Oladyshkin, S., Nowak, W.: The connection between Bayesian inference and information theory formodel selection, information gain and experimental design. Entropy 21(11), 1081 (2019) 86. Angluin, D., Smith, C.H.: Inductive inference: theory and meth- ods. ACM Comput. Surv. 15(3), 237–269 (1983) 87. Kass, R.E., Raftery, A.E.: Bayes factors. J. Amer. Statist. Assoc. 90(430), 773–795 (1995) 88. Jeffreys, H.: The Theory of Probability. Oxford University Press, Oxford (1961) 89. Marelli, S., Lüthen, N., Sudret, B.: UQLab user manual – Poly- nomial chaos expansions. Technical Report UQLab-V1.4-104, Chair ofRisk, Safety andUncertaintyQuantification,ETHZurich, Switzerland (2021) 90. Wagner, P.-R., Nagel, J., Marelli, S., Sudret, B.: UQLab user man- ual – Bayesian inversion for model calibration and validation. Technical Report UQLab-V1.4-113, Chair of Risk, Safety and Uncertainty Quantification, ETH Zurich, Switzerland (2021) 91. Berger, J.O.: Statistical Decision Theory and Bayesian Analysis. Springer, New York (2013) 92. Harlow, F.H., Welch, J.E.: Numerical calculation of time- dependent viscous incompressible flow of fluid with free surface. Phys. Fluids 8(12), 2182–2189 (1965) 93. Hackbusch, W.: On first and second order box schemes. Comput- ing 41(4), 277–296 (1989) 94. Smith, A.F., Gelfand, A.E.: Bayesian statistics without tears: a sampling–resampling perspective. Amer. Statist. 46(2), 84–88 (1992) 95. Oladyshkin, S., Nowak, W.: Incomplete statistical information limits the utility of highorder polynomial chaos expansions. Reliab. Eng. Syst. Saf. 169, 137–148 (2018) 96. Mohammadi, F.: Development and Realization of Validation Benchmarks. arXiv:2011.13216 (2020) 97. Sokal, A.D.: Monte Carlo Methods in Statistical Mechanics: Foundations and New Algorithms Note to the Reader. In: Func- tional Integration, pp. 131–192 (1996) 98. Goodman, J.,Weare, J.: Ensemble samplerswith affine invariance. Commun. Appl. Math. Comput. Sci. 5(1), 65–80 (2010) 99. Foreman-Mackey, D., Farr, W.M., Sinha, M., Archibald, A.M., Hogg, D.W., Sanders, J.S., Zuntz, J., g. Williams, P.K., j. Nelson, A.R., de Val-Borro, M., Erhardt, T., Pashchenko, I., Pla, O.A.: emcee v3: a Python ensemble sampling toolkit for affineinvariant MCMC. J. Open Res. Softw 4(43), 1864 (2019) 100. Xu, T., Valocchi, A.J.: A Bayesian approach to improved calibra- tion and prediction of groundwater models with structural error. Water Resour. Res. 51(11), 9290–9311 (2015) 101. Iooss, B., Lemaître, P.: In: Dellino, G.,Meloni, C. (eds.) AReview on Global Sensitivity Analysis Methods, pp. 101–122. Springer, Boston (2015) 102. Sobol’, I.: Sensitivity estimates for nonlinear mathematical mod- els. Math. Model. Comput. Exp 1(4), 407–414 (1993) 103. Oladyshkin, S., De Barros, F., Nowak,W.: Global sensitivity anal- ysis: a flexible and efficient framework with an example from stochastic hydrogeology. Adv. Water Res. 37, 10–22 (2012) 104. Sudret, B.: Global sensitivity analysis using polynomial chaos expansions. Reliab. Eng. Syst. Saf. 93(7), 964–979 (2008) 105. Oladyshkin, S., Mohammadi, F., Kroeker, I., Nowak, W.: Bayesian3 active learning for theGaussian process emulator using information theory. Entropy 22(8), 890 (2020) 106. Kennedy, M.C., O’Hagan, A.: Bayesian calibration of computer models. J. R. Stat. Soc. Ser. B Stat. Methodol. 63(3), 425–464 (2001) 107. Bayarri, M.J., Berger, J.O., Paulo, R., Sacks, J., Cafeo, J.A., Cavendish, J., Lin, C.-H., Tu, J.: A framework for validation of computer models. Technometrics 49(2), 138–154 (2007) 108. Brynjarsdóttir, J., O’Hagan, A.: Learning about physical parame- ters: The importance of model discrepancy. Inverse Probl. 30(11), 114007 (2014) 109. Ling, Y., Mullins, J., Mahadevan, S.: Selection of model dis- crepancy priors in Bayesian calibration. J. Comput. Phys. 276, 665–680 (2014) 110. Gardner, P., Rogers, T., Lord, C., Barthorpe, R.: Learning model discrepancy: A Gaussian process and sampling-based approach. Mech. Syst. Sig. Process. 152, 107381 (2021) 111. Schneider, M., Gläser, D., Weishaupt, K., Coltman, E., Flemisch, B., Helmig, R.: Coupling staggered-grid and vertex-centered finite-volumemethods for coupled porousmedium free-flowprob- lems. arXiv:2112.11089 (2021) 112. Koch, T., Gläser, D., Weishaupt, K., Ackermann, S., Beck, M., Becker, B., Burbulla, S., Class, H., Coltman, E., Emmert, S., Fet- zer, T., Grüninger, C., Heck, K., Hommel, J., Kurz, T., Lipp, M., Mohammadi, F., Scherrer, S., Schneider,M., Seitz, G., Stadler, L., Utz, M., Weinhardt, F., Flemisch, B.: Dumux 3-an open-source simulator for solving flow and transport problems in porousmedia with a focus on model coupling. Comput. Math. Appl. 81, 423– 443 (2021) 123 685 http://arxiv.org/abs/2011.13216 http://arxiv.org/abs/2112.11089 Computational Geosciences (2023) 27:663–686 Publisher’s Note Springer Nature remains neutral with regard to juris- dictional claims in published maps and institutional affiliations. Authors and Affiliations Farid Mohammadi1 · Elissa Eggenweiler2 · Bernd Flemisch1 · Sergey Oladyshkin1 · Iryna Rybak2 · Martin Schneider1 · Kilian Weishaupt1 Elissa Eggenweiler elissa.eggenweiler@ians.uni-stuttgart.de Bernd Flemisch bernd.flemisch@iws.uni-stuttgart.de Sergey Oladyshkin sergey.oladyshkin@iws.uni-stuttgart.de Iryna Rybak iryna.rybak@ians.uni-stuttgart.de Martin Schneider martin.sc