Vol.:(0123456789) Flow, Turbulence and Combustion (2025) 115:1059–1094 https://doi.org/10.1007/s10494-024-00595-7 RESEARCH Generalization Limits of Data‑Driven Turbulence Models Hannes Mandler1 · Bernhard Weigand1 Received: 22 July 2024 / Accepted: 12 October 2024 / Published online: 9 November 2024 © The Author(s) 2024 Abstract Many industrial applications require turbulent closure models that yield accurate predic- tions across a wide spectrum of flow regimes. In this study, we investigate how data-driven augmentations of popular eddy viscosity models affect their generalization properties. We perform a systematic generalization study with a particular closure model that was trained for a single flow regime. We systematically increase the complexity of the test cases up to an industrial application governed by a multitude of flow patterns and thereby demonstrate that tailoring a model to a specific flow phenomenon decreases its generalization capa- bility. In fact, the accuracy gain in regions that the model was explicitly calibrated for is smaller than the loss elsewhere. We furthermore show that extrapolation or, generally, a lack of training samples with a similar feature vector is not the main reason for generaliza- tion errors. There is actually only a weak correlation. Accordingly, generalization errors are probably due to a data-mismatch, i.e., a systematic difference in the mappings from the model inputs to the required responses. More diverse training sets unlikely provide a remedy due to the strict stability requirements emerging from the ill-conditioned RANS equations. The universality of data-driven eddy viscosity models with variable coefficients is, therefore, inherently limited. Keywords Data-driven turbulence modeling · Machine learning · Extrapolation · Generalizability · NeuralSST · Ribbed two-pass cooling channel 1 Introduction Turbulent flows play an important role in many industrial applications. In order to speed- up the design cycles and reduce the development cost, CFD simulations become increas- ingly more important (Slotnick et al. 2014; Bush et al. 2019), especially those solving the Reynolds-averaged Navier–Stokes (RANS) equations (Slotnick et  al. 2014; Rumsey and Coleman 2022). For non-reacting, incompressible, single-phase flows they read * Hannes Mandler hannes.mandler@itlr.uni-stuttgart.de 1 Institute of Aerospace Thermodynamics, University of Stuttgart, Pfaffenwaldring 31, 70569 Stuttgart, Germany http://crossmark.crossref.org/dialog/?doi=10.1007/s10494-024-00595-7&domain=pdf 1060 Flow, Turbulence and Combustion (2025) 115:1059–1094 and describe the spatio-temporal evolution of the mean velocity u , the mean pressure p ∗ = p∕� and the mean temperature T . The operator Dt = �t + ( u ⋅ � ) denotes the mean substantial derivative. As the density � , the kinematic viscosity � and the Prandtl number Pr are assumed to be constant, the temperature can be treated as a passive scalar. The (specific) Reynolds stress tensor (RST) −u�u� and the (specific) scalar flux vector (SFV) −u�T � , that occur in Eq. (1b) and Eq. (1c), respectively, require a closure model. The most popular class of closures in industrial applications are linear eddy viscosity (LEVM) (Slotnick et al. 2014) and diffusivity models as they provide a reasonable bal- ance between numerical stability, computational efficiency and accuracy. The corre- sponding constitutive laws assume linear relationships between the anisotropic part of the RST and the mean strain rate tensor S (Boussinesq 1877) as well as the SFV and the mean temperature gradient, respec- tively. In Eqs. (2) and (3), we substituted the eddy viscosity �t by the product of the tur- bulent kinetic energy (TKE) k, the turbulent time scale � = �(k, � ) and a non-dimensional coefficient g1 , where � is an auxiliary scale-providing variable, for instance, the turbulent dissipation rate � or frequency � . The turbulent Prandtl number Prt in principle depends on the local flow regime and is a user-defined parameter. The turbulent scales ZC = [k, � ]T are obtained from the solution of usually two model-specific transport equations which in general also depend on the primary model variables ZM = [ u, p ]T . Like the con- stitutive law given by Eq.  (2), these transport equations contain some non-dimensional coefficients gi ∈ G which require calibration. Such linear eddy viscosity models (LEVMs) often lack the desired accuracy because they suffer from several structural deficiencies. Besides the locality of Eq.  (2) and its linearity in the mean strain rate tensor (Pope 1975; Rodi 1976; Schmitt 2007), its spe- cific parametrization by means of G can cause substantial errors. Traditionally, the model coefficients were treated as a set of universal constants which were calibrated for very simple flows as homogeneous, isotropic turbulence, zero-pressure gradient (ZPG) boundary or free-shear layers (Wilcox 2006). Yet, the optimal parameter values depend on the flow phenomena under investigation (Alfonsi 2009; Yarlanki et  al. 2012; Ray et al.; Li et al. 2023). Because the complex geometries in real-world applications often (1a)0 = � ⋅ u (1b)Dt ( u ) = � ⋅ ( 2𝜈S − u�u� ) − �p̄∗ (1c)Dt ( T ) = � ⋅ ( � Pr �T − u�T � ) (2)−u�u� = − 2 3 kI + g1k�S, (3)−u�T � = − g1k� 2Prt �T . (4)Rc ( ZM ,ZC ∣ G ) = 0, 1061Flow, Turbulence and Combustion (2025) 115:1059–1094 induce a plethora of different flow patterns, e.g., flow separation, strong streamline cur- vature or swirl, a set of constant coefficients is in general inappropriate. Hence, Glushko (1966); Pope (1975) and Launder et  al. (1977), among others, proposed models with variable coefficients depending on the mean flow field, i.e., G = G ( ZM ,ZC ) . The emerging discipline of data-driven closure modeling builds upon these early works leveraging modern machine learning techniques in order to exploit the continuously growing amount of (publicly) available calibration data. Weatheritt and Sandberg (2017); Wu et al. (2018); Schmelzer et al. (2020); Jiang et al. (2021); Ham- mond et al. (2022); Mandler and Weigand (2022b); Fiore et al. (2022); Yin et al. (2022); Miró et al. (2023) and many others demonstrated the superior accuracy of data-driven models when they are applied to the same class of flows that they have explicitly been trained for. But just as for the traditional closures, the same question of generalization to off-cal- ibration scenarios arises. Accurate predictions in flow regimes which the model was not optimized for would require the same mappings from the chosen feature space to the model coefficients (Tracey et al. 2013; Milani et al. 2017; Zhao et al. 2020; Srivastava and Duraisamy 2021; Rumsey et al. 2022), but according to Lav et al. (2019) and Fang et al. (2023) this does not even hold true for boundary and different kind of free-shear layers. Overcoming this multivalence (Zhang and Duraisamy 2015; Liu et  al. 2021) or non- uniqueness problem (Sotgiu et al. 2019; Jiang et al. 2021; Srivastava and Duraisamy 2021) and striving for universality is challenging. In fact, there is serious doubt that a universal model exists (Geneva and Zabaras 2019; Taghizadeh et al. 2020; Rumsey et al. 2022; Fang et  al. 2023; Lav et  al. 2023). Mixture models (Cherroud et  al. 2023; Lozano-Durán and Bae 2023), which are based on an ensemble of expert models each optimized for a specific regime, may thus offer an appealing alternative. This old idea of using a model only where it is supposed to work (Wu et al. 2018) has, for instance, led to the SST model (Menter 1994b). However, this raises the question of how many different phenomena can or should be distinguished. In this paper, we investigate whether LEVMs with variable coefficients can improve predictions in complex, multi-phenomena applications if they were optimized for only a single but prominent regime. As popular LEVMs with constant coefficients lack accuracy for many flow patterns, there is definitely a potential for local and thus global enhance- ments. The central question is, whether this potential can be exploited by increasing a baseline model’s complexity and thereby tailoring it to a more relevant phenomenon. We, therefore, address the trade-off between optimizing a model for a single flow regime and the generalization capability to be expected. To this end, we perform a systematic generalization study for a particular data-driven closure that was trained on the commonly used two-dimensional flow over periodic hills. After demonstrating that this model does indeed improve the flow field predictions for the training case, we first vary the Reynolds number and the geometry of the test case with- out qualitatively altering the ground truth flow field. Subsequently, we test the model on a three-dimensional channel with periodic ribs, which is phenomenologically similar to the training case. We then apply the model to off-calibration cases like a plane channel flow and an impinging jet. Finally, we assess the model’s accuracy on a ribbed, two-pass cool- ing channel, which resembles all of the aforementioned test cases in some regions of the domain. This careful analysis reveals that data-driven closures are applicable to different geom- etries and Reynolds number as long as the expected flow fields of the training and test case are qualitatively similar. Tailoring a model towards a specific flow regime, though, 1062 Flow, Turbulence and Combustion (2025) 115:1059–1094 seems to amplify its generalization error in other regimes. Despite local accuracy gains in regions of the ribbed two-pass channel that resemble the flow over periodic hills, we conse- quently observe a global accuracy loss compared with the SST model. We conclude that an increasing level of specificity adversely affects generality. Furthermore, we investigate the potential causes of the observed generalization errors, in particular the similarity of the model inputs in the training and test set. Extrapolation is a well-known special case of dissimilarity between test and training samples, i.e., lack- ing support of a test sample by similar training samples in the feature space, and widely accepted as a potential source of error. We thus quantify the local lack of training set sup- port for all test cases. However, our results suggest that there is no consistent correlation between the training set support in the feature space and the models’ predictive accuracy. We indeed observe regions of both high and low a posteriori accuracy which are altogether characterized by missing training set support. We, therefore, argue that the lack of training samples with similar feature vectors is no reliable indicator for generalization errors but only for the prediction confidence. Accordingly, generalization errors are most likely due to the fact that different flow regimes may be characterized by similar model inputs but never- theless require ambiguous responses. The remainder of this paper is structured as follows. In Sect. 2, we outline the structure of the data-driven closure under investigation including a description of the training set. In Sect. 3, we explain the difference between extrapolation and the lack of training set sup- port in the model’s feature space in general. We additionally provide an exact as well as an approximate quantification metric for the latter. Subsequently, we present the test cases and the corresponding results in Sects. 4 and 5, respectively. We close with a discussion how the stability requirements imposed by the RANS solver inherently limit the generalization capabilities of data-driven eddy viscosity models (EVMs) in Sect. 6 and a brief conclusion in Sect. 7. 2 Data‑Driven Turbulence Model The neuralSST model of Mandler and Weigand (2023a) serves as an example of a data- driven LEVM with variable closure coefficients. It was originally proposed by Mandler and Weigand (2022b) and investigated in greater detail by Mandler and Weigand (2022a) and Mandler and Weigand (2023b). In the following, we briefly recall the model structure, its training procedure and its implementation into the RANS solver. 2.1 Structure The neuralSST model is a data-augmented variation of the SST model (Menter et al. 2003). For convenience, the baseline model is described in Appendix A. The neuralSST model’s constitutive laws are given by Eqs. (2) and (3). Rather than adjusting all the coefficients occurring in the SST model’s scale equations (Liu et  al. 2024), we pursue the "k-corrective frozen-RANS" approach (Schmelzer et  al. 2020) and account for their collective error by introducing a correction term with a single variable and dimensionless coefficient g0. 1063Flow, Turbulence and Combustion (2025) 115:1059–1094 In Eq. (5), RSST c and �SST t denote the original SST model definitions of the scale equations and the eddy viscosity given by Eqs. (A3) and (A2), respectively. The remaining constitu- ents of the correction term apart from the coefficient g0 ensures the dimensional homoge- neity of Eq. (5). In order to avoid that the turbulent time scale falls below the Kolmogoroff scale in immediate vicinity to the wall, we adopt a limiter based on the turbulent Reynolds number Ret = k∕(��) (Durbin 1991; Menter et al. 2012) where �∗ = 0.09 is a constant. The coefficients gn = gn(Q) ∀n ∈ [0, 1] may be arbitrary functions of the flow field by virtue of the independent variables q = q ( ZM ,ZC ) ∈ Q . The feature set consists of the non-dimensional magnitudes of both the strain and rotation rate tensor, the ratio of the turbulent length scale and the gradient decay length of the TKE, a limited wall distance-based Reynolds number Red = √ kd∕� and one of the blending arguments of the original SST model arg2 as defined by Eq. (A7). In Eq. (7), we make use of the Euclidean vector norm |⋅| , the Frobenius norm ‖⋅‖ and the softsign function S(�) = �∕(|�| + 1) (Ling and Templeton 2015). 2.2 Training Methodology We adopt a supervised learning strategy opting for a priori accuracy. Thus, both the exem- plary in- and outputs are sampled from high-fidelity data. We obtain the optimal values of the k-correction coefficient for all grid points of the high-fidelity data by solving the fol- lowing inverse problem (Schmelzer et al. 2020). Variables that are directly extracted from the high-fidelity data are denoted by a tilde. Besides ĝ0 , Eq. (8) also yields the turbulent frequency corresponding to the high-fidelity dataset, i.e., �̃ ≡ �̂ . Subsequently, the optimal coefficient of the constitutive law results can be obtained from the projection (Mandler and Weigand 2024) (5)Rc = RSST c − g0 k � ( 1 �SST −1 t ) = 0 (6)� = (�∗�)−1 max ( 1.0, 1.8Re − 1 2 t ) , (7) Q = � S � � ���S ��� � S � −� ���� ��� � ,S � − ���k� √ k � , min � Red 50 , 2 � , tanh � min �arg2 2 , 5 ��� (8) �̂, ĝ0 =argmin �,g0 ||Rc || s.t. u = ũ, u′u′ = ̃ u′u′, k = k̃ (9)ĝ1 = ̃ u�u� ∶ ̃ S k̃�̃‖‖ ̃ S‖‖ 2 . 1064 Flow, Turbulence and Combustion (2025) 115:1059–1094 We standardize both model in- and outputs according to Z(�) = (� − �)∕� , where � is a vector of observations from a single feature or target. The mean and the standard deviation of � are denoted by � and � , respectively. This ensures that all features and targets have similar scales, which reduces the stiffness of the subsequent regression problem. In order to find the optimal parameters �̂ of the functions ĥn = hn ( Z(q) || �̂ ) that approximate the ground truth mappings h̃n ∶ ℝ #Q → ℝ;Z ( q̃ ) ↦ Z ( ĝn ) best, we solve two parametric regression problems. We use feed-forward neural networks (NNs) with six hidden layers and six neurons each as ansatz for the hypotheses hn = hn ( Z(q) || � ) , where � denotes the collection of their weights and bias terms. A schematic of such a NN is shown in Fig. 1. In Eq. (10), we sum over all observations in the active training set A , which we define in the next section. The constraint in Eq. (10) ensures an upper limit for the hypotheses’ Lipschitz constants with K0 = 15 and K1 = 1.5 . It, therefore, acts as a means of regularization. 2.3 Training Data The neuralSST model is informed with direct numerical simulation (DNS) data of a two- dimensional flow over periodic hills (PHs) for a Reynolds number based on the bulk veloc- ity ub and the hydraulic diameter Dh of Reb = ubDh∕� = 22,803 . The geometry is depicted in Fig. 2. The training configuration is characterized by a hill width scaling factor, a non- dimensional pitch of the ribs and a non-dimensional channel height of � = 1 , p∕H = 9.0 and Ly∕H = 3.036 , respectively. The repeating pattern of flow separation and reattachment poses a serious challenge for most existing EVMs as the SST model (Jakirlic 2012). The extension of the separation (10) �̂ =argmin � ∑ j∈A [ hn ( Z ( q̃ (j)) || � ) −Z ( ĝ(j) n )]2 s.t. |||| �hn �Z(q) |||| ≤ Kn Fig. 1 Schematic of the feed-forward NNs adopted in the neuralSST model to represent the closure coef- ficients gn ∀n ∈ [0, 1] as functions of the features qi ∈ Q with rectified linear unit (ReLU) activations � of the hidden neurons 1065Flow, Turbulence and Combustion (2025) 115:1059–1094 zone, which is often of practical interest, is mainly determined by the inclination of the hill flanks and the Reynolds number. Let D = {1, 2,… , #D} denote the set indices of all grid points at which DNS data are available. In this particular case, the DNS solution was previously interpolated onto a RANS grid, wherefore at most #D = 14,751 observations may be used to train the model. In order to prevent learning numerically unstable trends which are a conse- quence of the deficient Boussinesq hypothesis, however, we use only a subset thereof, i.e., the active training set using the filters and with ĝ0min = −10 and ĝ1min = 0. 2.4 Implementation We introduce a blending against the baseline model like Kaandorp and Dwight (2020). This is formally equivalent to a damping of the nonlinear terms if the baseline model is linear (Huijing et al. 2021; Tang et al. 2023; Ellis and Xia 2023). In general, we may write this as a mixture of a set of models M = {SST, nSST} , where nSST represents the neuralSST model. The blending or mixture parameters �m n fulfill the normalization constraint ∑ m∈M �m n = 1 . In the simplest case, they are constants but they may also be functions of the local flow field. The latter case leads to zonal models. While the blending and the model (11)An = ( D ∩ Fgn ) ⧵ FS ∀n ∈ [0, 1] (12)FS = { j ∈ D ||| �𝜏 (j)‖‖‖ � S (j)‖‖‖ < 0.5 } (13)Fgn = { j ∈ D ||| ĝnmin ≤ ĝ(j) n ≤ 10 } (14)gn = ∑ m∈M �m n (q)gm n (q) Fig. 2 The two-dimensional flow over periodic hills (PHs): geom- etry, boundary conditions and evaluation lines for the velocity profiles (red dashed) 1066 Flow, Turbulence and Combustion (2025) 115:1059–1094 coefficients may share the same feature space, this is not necessary. In this work, we set �nSST n = 1 ∀n ∈ {0, 1} exploiting the full data-driven correction. While the model coefficient gSST 0 = 0 , gSST 1 is given by Eq. (A10). For the neuralSST model, we, introduce limiters on the data-driven predictions of the coefficients (Weath- eritt and Sandberg 2017; Milani et al. 2017; Sotgiu et al. 2019; Hammond et al. 2022; Fiore et al. 2022; Rumsey et al. 2022; Miró et al. 2023). The corresponding limits are set such that g0 ∈ [−1.0, 1.5] and g1 ∈ [0.1, 0.4] . For consist- ency, the inverse standardization of the predicted coefficients Z−1 gn adopts the mean and the standard deviation from the training set. Rather than modifying the baseline model’s implementation of the constitutive law into Eq. (1b), we add an explicit source term to it that accounts for the difference of the original and the data-driven model (Basara and Jakirlic 2003). The realizability correction R , as described in greater detail by Mandler and Weigand (2022b), is not applied by default but only if it is required to ensure numerical stability of the solution. In comparison with alternative implementation strategies, we found this to be the most stability-promoting one, which agrees with the results of Amarloo et al. (2022). The outlined implementation guarantees that the corresponding effective diffusion coef- ficient is strictly positive and has a large magnitude, which enhances the diagonal domi- nance of the system. 3 Training Set Support in the Feature Space The training set support TSS(q) measures the similarity of a feature vector q that a data- driven model is queried at and the set of feature vectors { q(j) ∣ j ∈ A } that it has been trained on. By defining the distance between a point and a set as the distance between this point and its nearest neighbor within the set (Ling and Templeton 2015), we arrive at the following expression for the training set support.1 A typical choice for the distance metric d(⋅, ⋅) , which must be defined on the feature space Q , is the L2-norm. In Eq. (17), we relate the distance of the query point from the training samples to some critical distance d∗ , which we define as the average distance between the training samples using the expectation operator � . This allows us to interpret the TSS = 1 iso-surface in Q as the boundary �S of the region S ⊂ Q in which model queries are well (15)gnSST n = max [ gmin n , min ( Z −1 gn [ hn(q) ] , gmax n )] (16) Dt ( u ) = −�p̄∗ + � ⋅ ( 2 [ 𝜈 + 𝜈SST t ] S ) ����������������������� implicit diffusion term −� ⋅ ( 2𝜈SST t S +R [ −g1k𝜏S ]) ������������������������������������������� explicit source term (17)TSS(q) ∶= d∗ d ( q, { q(j) ∣ j ∈ A }) ∶= � ({ d ( q(i), q(k) ) ∣ i, k ∈ A }) min { d ( q, q(j) ) ∣ j ∈ A } 1 In this section, we neglect the standardization Z of the feature space for the sake of brevity. To properly account for them, all occurrences of q and Q need to be substituted by Z(q) and Z(Q) , respectively. 1067Flow, Turbulence and Combustion (2025) 115:1059–1094 supported by the training set. The well-supported region of the feature space is thus given by the union of the #A #Q-dimensional hyperspheres Sph ( q(j), r ) centered around each of the training samples q(j) with a radius r that equals the mean distance between the training samples. This potentially disjoint region is visualized in Fig. 3. In general, the confidence in a model’s prediction is negatively correlated with the training set support. Although the term extrapolation is frequently used as if it was defined as the lack of training set support, it is not. A model is actually said to extrapolate if its query point q cannot be written as a convex combination of the training samples { q(j) ∣ j ∈ A } (Illowsky and Dean 2018). The boundary �I between the region of the feature space in which the model interpolates I ⊂ Q and its complement, the region in which the model extrapolates, is thus given by the convex hull around the training samples. This is also shown in Fig. 3. Obviously, there may be large regions I ⧵ S in which the model interpolates but the query is not well supported and also smaller regions S ⧵ I which are well supported but cause extrapolation. The extrapolation extent is then proportional to the shortest distance from the query point q ∈ Q ⧵ I to the convex hull around the training samples in the model’s feature space. As the evaluation of Eq.  (17) requires to store the entire training set, which can be a huge amount of data, Ling and Templeton (2015) proposed to use a memory-efficient approximation that only needs access to the statistical moments of the training set, i.e., the mean feature vector � and the covariance matrix �. This approximation relies on the Mahalanobis distance between the query point and the training set distribution. The critical Mahalanobis distance d∗ M is chosen such that TSS < 1 holds true for 1 % of the training samples (Ling and Templeton 2015). This implies that the well supported region of the feature space is approximated as S ≈ Sph ( �, d∗ M ) , see Fig. 3. If the training samples were normally distributed according to N(�,�) , they were evenly spread within Sph ( �, d∗ M ) . In this case, the above approximation holds true and TSS can be interpreted as the probability that a test sample was drawn from the training distribution (Wu et al. 2017). As the training set is usually not normally distributed (Wu et al. 2017), (18)TSS(q) ≈ d∗ M√ (q − �)�−1(q − �)T Fig. 3 Illustrating the subspace S of a bi-variate model’s feature space Q which is well sup- ported by the training samples { q(j) ∣ j ∈ A } (blue dotted), its approximation by the Mahalano- bis distance (red solid) and the region of interpolation I (green) which is bounded by the convex hull around the training samples (green dashed) using the data described in Sec. 2.3 1068 Flow, Turbulence and Combustion (2025) 115:1059–1094 which can also be clearly seen in Fig. 3, Eq.  (18) is only a reasonable approximation of Eq. (17) in the limit of TSS << 1. Ideally, a model would only be evaluated within I ∩ S . Considering the inhomogene- ous distribution of the training samples in the feature space and that the optimal model response is inherently unknown in the region I ⧵ S , the training set support seems to be a more appropriate measure for the confidence in a model’s prediction than the extrapolation extent. In the subsequent section, we thus focus on quantifying the training set support. Finally, we want to emphasize that phenomenological extrapolation does neither imply extrapolation in its strict sense as explained above nor a lack of training set support. Phe- nomenological extrapolation or generalization is based on loosely defined categorical flow patterns like "free-shear flow" or "boundary layer". Whether feature vectors sampled from two different flow phenomena differ significantly depends on the choice of the feature space, i.e., Eq. (7), and the flow phenomena under consideration. 4 Test Cases In this section, we describe the different test cases and their numerical setups. A complete list including the bulk Reynolds numbers (based on the hydraulic diameter) is given in Table 1. For all cases, we employ a SIMPLEC solver (Doormaal and Raithby 1984) and a second order accurate spatial discretization. The computational grids are all structured and wall- resolved. Both their cell counts N and near-wall resolution in terms of the non-dimensional height of the first cell y+ 1 = ||u� ||y1∕� , where u� denotes the friction velocity and y1 the wall- distance, are listed in Table 1. Corresponding grid convergence studies proved that a fur- ther mesh refinement does not noticeably change the results shown in the present paper. For all non-isothermal cases, we adopt values for the molecular and turbulent Prandtl num- ber of Pr = 0.71 and Prt = 0.85 , respectively. The latter choice is common for channel flow applications (Dietz et al. 2007; Weihing et al. 2014; Ellis and Xia 2023). Flow over periodic hills Table 1 Test cases and their numerical setups: dimensionality D, bulk Reynolds number Reb , hill width- scaling factor � , grid creator, cell count N, maximum dimensionless first cell height throughout the entire domain ‖‖y + 1 ‖‖∞ , usage of the realizability correction R and source of the reference data Case D Reb � Grid from N ‖ ‖y + 1 ‖ ‖∞ R Reference data PC 1 2.8 ⋅ 104 – – 4.2 ⋅ 101 0.8 ✗ Abe et al. (2001) PH 2 2.9 ⋅ 103 1.0 Xiao et al. (2020) 1.5 ⋅ 104 0.2 ✗ Breuer et al. (2009) PH 2 2.3 ⋅ 104 0.5 Xiao et al. (2020) 1.5 ⋅ 104 0.9 ✗ Xiao et al. (2020) PH 2 2.3 ⋅ 104 1.0 Xiao et al. (2020) 1.5 ⋅ 104 0.9 ✗ Xiao et al. (2020) PH 2 2.3 ⋅ 104 1.5 Xiao et al. (2020) 1.5 ⋅ 104 0.8 ✗ Xiao et al. (2020) PH 2 1.5 ⋅ 105 1.0 Xiao et al. (2020) 1.5 ⋅ 104 4.0 ✗ Rapp and Manhart (2011) PR 3 3.0 ⋅ 104 – Sotgiu (2020) 3.1 ⋅ 105 2.0 ✓ Rau et al. (1998) IJ 2 2.3 ⋅ 104 – – 6.4 ⋅ 104 1.2 ✗ Cooper et al. (1993) Baughn and Shimizu (1989) RMPC 3 5.0 ⋅ 104 – Sotgiu (2020) 8.6 ⋅ 106 2.9 ✓ Schüler et al. (2009) 1069Flow, Turbulence and Combustion (2025) 115:1059–1094 As we pursue an a priori model development strategy, the training case is the most natu- ral a posteriori test case. Due to the resulting inconsistency of the learning and the injection environment (Sotgiu et al. 2018; Duraisamy 2021; Kurz and Beck 2021) and the potentially severe error accumulation over the course of the simulation (Poroseva et al. 2016; Thomp- son et al. 2016; Wu et al. 2019; Melchers et al. 2023), a posteriori accuracy for the training case needs to be demonstrated to ensure a meaningful generalization study. The geometry and the corresponding boundary conditions of the training case are depicted in Fig.  2. Besides the particular training configuration, we consider two varia- tions of the Reynolds number and the hill slope, respectively, as indicated in Table 1. These variations are chosen such that they alter the baseline solution in opposite ways. While one increases the size of the separation zone, the other one decreases it wrt. the training con- figuration. The non-dimensional pitch of the hills depends on the hill width scaling factor � according to p∕H = 9.0 + 3.858(� − 1) . As all PH cases are considered to be isothermal, Eq. (1c) and the Eq. (3) are obsolete. Flow over periodic ribs The flows over periodic ribs (PRs) and PHs are phenomenologically similar as both are governed by large separation zones between periodic obstacles. As for the training case, the obstacles’ non-dimensional pitch amounts to p∕H = 9 . However, there are four major differences in comparison with the training case that are apparent from Fig. 4. First, the obstacles are sharp ribs rather than smooth hills. The location at which the flow separates from the contour is thus determined by the geometry at x∕H = 0.5 . The upper boundary is, secondly, defined as a symmetry plane as there are ribs attached to both the upper and the lower wall. Thirdly, the ratio of the channel and the rib height Ly∕H = 10 is larger. Finally, this case is three-dimensional with lateral walls. These induce a Reynolds stress anisot- ropy-driven secondary flow which increases the momentum transfer normal to the bulk flow direction. Due to the linearity of Eq. (2) on the mean strain rate tensor, this second- ary flow cannot be accounted for and is not further investigated. We may thus not expect a perfect agreement in the primary flow solution. As the secondary motion tends to shift the reattachment point upstream, we rather have to expect that the overestimation of the sepa- ration zone’s extension is only reduced but not entirely compensated for. Nevertheless, this test case allows for an assessment of the models’ ability to generalize to cases in which the mean flow solution varies in more dimensions of the physical space and in which the flow separation is triggered by a different mechanism. Plane channel flow Fig. 4 The three-dimensional flow over periodic ribs (PRs): geometry, boundary conditions and evaluation lines for velocity and Nusselt number profiles (red) 1070 Flow, Turbulence and Combustion (2025) 115:1059–1094 The flow through a plane channel (PC) is one of the canonical calibration cases of tur- bulence models. Although it is thus hardly possible for a data-driven LEVM to achieve an improvement compared to the baseline prediction, a similar accuracy is typically expected (Srivastava and Duraisamy 2021; Rumsey and Coleman 2022). The computational setup is the same as for the flow over PR if H = 0 and periodic boundary conditions are applied in spanwise direction. In contrast to the training case, in which the boundary layers are sub- ject to adverse and favorable pressure gradients (APGs/ZPGs), the distinguishing phenom- enon of the PC flow is the ZPG boundary layer. As a consequence, the PC flow allows us to study how well the model generalizes to cases in which the mean flow solution varies in less physical dimensions and which are governed by a different flow phenomenon. Impinging Jet All aforementioned test cases are channel flows of varying complexity. In order to test the neuralSST model’s ability to handle stagnation point flows or flows with strong stream- line curvature, we apply it to an impinging jet (IJ). The computational setup is illustrated in Fig. 5. In relation to the nozzle diameter D = 2R , the height of the nozzle over the impinge- ment plate, the wall thickness of the pipe, the radial extension of the domain and the outlet plenum height amount to H∕D = 2 , t∕D = 0.0313 , Lr∕D = 15 and Ly∕D = 4 , respectively. At the inlet, we prescribe the solution uy(r) of a fully developed pipe flow for a Reynolds number of Reb = 23,000 and a uniform temperature Tin . The outlet boundary condition is chosen to allow for backflow into the domain. While the pipe wall is set adiabatic, the Fig. 5 A single impinging jet (IJ): geometry, boundary condi- tions and evaluation lines for velocity profiles (red, dashed) Fig. 6 The flow through a ribbed multi-pass channel: geometry, boundary conditions and streamwise num- bering of the segments (red). Geometry taken from Schüler et al. (2008) 1071Flow, Turbulence and Combustion (2025) 115:1059–1094 impingement plate is held at a constant temperature Tw ≠ Tin . Exploiting the radial symme- try of the round jet, we only compute a 5◦ wedge of the circular domain and apply periodic boundary conditions. Ribbed Two-Pass Cooling Channel Our final test case is a ribbed two-pass cooling channel as illustrated in Fig.  6. Such ribbed multi-pass channels (RMPCs) are commonly used for the internal cooling of gas turbine blades (Suo 1985; Weigand et al. 2001; Han 2004). The channel’s cross-section is thus mainly determined by the blade contour, wherefore the first pass is clearly not rectan- gular. To increase the mixing of the hot coolant near the walls with the colder coolant in the channel center, ribs are used as turbulators. Due to their inclination, they additionally induce a secondary flow towards the leading edge (LE), where the cross-section becomes narrow and the coolant heats up more rapidly. This secondary fluid motion helps to cool the LE, which faces the highest thermal loads of the blade. Each of the previous test cases only contains a very limited number of flow phenomena, where one usually dominates the others. On the contrary, the RMPC comprises a multitude of different flow phenomena, e.g., those present in our other test cases: flow separation, Reynolds stress anisotropy-induced secondary flows, strong streamline curvature as well as regions of stagnating flow. It may thus be seen as a superposition of the aforementioned test cases and thus allows us to quantify to which extent the interaction of different phe- nomena affects the model’s predictive accuracy. We investigate a generic geometry of a RMPC which has been subject to many experi- mental and numerical investigations (Schubert 2005; Schüler et  al. 2008, 2009, 2010; Elfert et al. 2011; Schroll et al. 2011; Schüler et al. 2011). The hydraulic diameter of the inlet and outlet pass amount to Din h = 0.6642ds and Dout h = 0.6642ds Despite the 45° incli- nation of the ribs, their geometry is very similar to those of the PR case. In fact, their non- dimensional height H∕Dh = 0.1 is the same and their non-dimensional pitch p∕H = 10 is only slightly larger. At the channel inlet, we prescribe a hydrodynamically and thermally fully developed flow for a bulk Reynolds number of Reb = 50,000 and a fixed bulk temperature Tb . At the outlet, we impose a zero-gradient condition for all variables. We furthermore enforce the wall temperature Tw ≠ T in b . 5 Results In this section, we present the results obtained when applying the neuralSST model to the test cases explained above. Besides discussing the flow field and the heat transfer, we also analyze the training set support. 5.1 Flow Over Periodic Hills As listed in Table 1, there are five different test configurations of the flow over PHs. At first, we present the results obtained on the training case and then continue by analyzing the consequences of changes in either the Reynolds number or the hill slope. 1072 Flow, Turbulence and Combustion (2025) 115:1059–1094 5.1.1 Training Case For this isothermal test case, we assess the neuralSST model’s local accuracy by virtue of mean velocity profiles sampled along the bulk flow direction at the locations indicated by the red dashed lines in Fig. 2. The corresponding velocity profiles are shown in Fig. 7a. Apparently, the neuralSST model consistently outperforms the SST model throughout the entire domain. This is particularly evident in the following three regions: in the vicinity and downstream of the true reattachment point, just above the hill crest and in the bulk flow above the ribs. These differences are mainly caused by a substantial increase of the eddy viscosity due to the data-driven correction, especially in the free shear layer emerging from the hill crest, see Fig. 8. As the lateral momentum transfer increases accordingly, the recirculation zone shrinks and the reattachment points shifts upstream. Due to continuity, the horizontal velocity component in the bulk flow above the hill crests must consequently decrease. This local improvement over the baseline model’s prediction also manifests glob- ally in a 7 % reduction of the relative mean absolute error (MAE) of the skin friction coef- ficient across the bottom wall as shown in Table 2. The relative MAE of some quantity of interest � on a boundary surface Ω is defined as Fig. 7 Mean streamwise velocity profiles in a flow through a two-dimensional channel with periodic hills for � = 1.0 , Reb = 22,803 (training) (a), � = 1.0 , Reb = 2,850 (b), � = 1.0 , Reb = 150,664 (c), � = 0.5 , Reb = 22,803 (d) and � = 1.5 , Reb = 22,803 (e) Fig. 8 Relative increase of eddy viscosity predicted by the neuralSST model wrt. the SST model for the flow through a two- dimensional channel with PHs at Reb = 22,803 and � = 1.0 1073Flow, Turbulence and Combustion (2025) 115:1059–1094 Although the model was trained purely a priori, it proves effective in making accu- rate a posteriori predictions. This is remarkable as there is no such guarantee (Meneveau 1994a; Liu et al. 1994). We found that this achievement was only possible with particu- lar feature sets and strong regularization, e.g., by enforcing small Lipschitz constants. While the former seems to be related to how well each feature is reproduced during the initial iterations of the solution process, the latter reduces the rate at which such initial errors are amplified (Melchers et al. 2023). In order to quantify the local training set support, we apply the two methods intro- duced in Sec. 3, which are based on the nearest neighbor and the Mahalanobis distance, respectively, to the converged flow solution. This comparison is illustrated in Fig. 9. At first, we focus on the nearest neighbor-based measure shown in Fig. 9a. Model queries are apparently well supported by training samples with a similar feature vector through- out large parts of the domain. However, there are three distinct regions where TSS−1 ≥ 1 and where the support is accordingly lacking: the low-strain band in the bulk flow, a (19)�� = � ( �, �̃ ) = ∫ Ω ||� − �̃||dA ∫ Ω ||�̃||dA . Table 2 Integral a posteriori training set support metrics and relative MAEs of the neuralSST model for the converged solution of all test cases, which are specified in terms of their dimensionality, bulk Reynolds number and hill width scaling factor (if applicable) Case D Reb � max(TSS−1) TSS−1 �nSST �nSST �SST − 1 Nu cf Nu cf PC 1 28,250 – 2.09 0.36 0.10 0.03 1.16 3.21 PH 2 2,850 1.0 3.30 0.56 – – – – PH 2 22,803 0.5 3.42 0.37 – 0.56 – 0.24 PH 2 22,803 1.0 3.49 0.35 – 0.83 – −0.07 PH 2 22,803 1.5 3.40 0.34 – 0.50 – −0.27 PH 2 150,664 1.0 3.42 0.33 – – – – PR 3 30,000 – 3.34 0.27 0.17 0.58 −0.55 −0.42 IJ 2 23,000 – 3.55 0.41 0.25 – 2.74 – RMPC 3 50,000 – 3.19 0.35 0.18 – 0.60 – Fig. 9 Training set support of the neuralSST model applied to the two-dimensional flow over PHs at Reb = 22,803 and � = 1.0 according to Eq.  (17) (a) and its approximation by the Mahalanobis distance given by Eq. (18) (b). The green lines indicate the boundary of the intersection of the active training sets of both model coefficients � ( A0 ∩A1 ) 1074 Flow, Turbulence and Combustion (2025) 115:1059–1094 circular region just above the hill crest and a shallow band along the bottom wall. All three regions have in common that they are characterized by a low non-dimensional strain rate, i.e., q1 < 0.5 . As explained in Sec.  2, we have removed samples with this characteristic from the training set. The boundaries of those regions from which the samples of the active training set were drawn are highlighted in green. In fact, they enclose all three poorly supported regions. This lack of support is thus a consequence of the chosen counteraction to prevent a higher-level model error, namely that the RST only depends on local velocity gradients, from distorting the model training. Despite this strong spatial variation of the training set support across the domain, the velocity profiles are accurate almost everywhere. In fact, the rank correlation between the local training set support and the local accuracy of the velocity prediction amounts to approximately 14 %. Our results thus demonstrate that the local training set support is no reliable indicator for the prediction accuracy. This makes intuitively sense because the training set support is only defined on the model’s feature space but does not take into account the required model response. Hence, the local training set support may only be used as a measure of confidence in the prediction. In Fig. 9b, we display the efficient TSS-approximation based on the Mahalanobis dis- tance. The regions of lacking support characterized by TSS−1 > 1 are accurately captured. Nevertheless, the support in the free-shear layer downstream of the hill crest is signifi- cantly underestimated, which ultimately results in a too pessimistic prediction confidence. As the Mahalanobis distance-based metric seems to be misleading, we only apply the near- est-neighbor method to the remaining test cases. 5.1.2 Variation of Reynolds Number Starting to explore the application limits of the neuralSST model, we begin with a variation in the bulk Reynolds number. We tested six different Reynolds numbers, but in this paper, we only report the results of the lowest and the highest one. The corresponding mean velocity profiles are depicted in Fig. 7b and c, respectively. For the low Reynolds number, we find a similar improvement over the baseline predic- tion as for the training case. Again, the data-driven model yields almost exactly the high- fidelity mean flow solution. We see only a slight overestimation of the size of the recircula- tion zone. For the highest Reynolds number, however, we observe a smaller accuracy gain. Despite still being more accurate than the SST model, the location of the reattachment point is located too far downstream. We, therefore, conclude that the turbulent mixing in the free shear layer tends to be underestimated for high Reynolds numbers but less than by the SST model. As the contour plots of the local training set support are qualitatively similar for all Reynolds numbers, we refrain from showing them. Instead, we compare two integral met- rics, namely the maximum and the mean value of the inverse of the local training set sup- port within the domain as listed in Table 2. We interestingly find that the minimum training set support slightly decreases towards both higher and lower Reynolds numbers. On aver- age, however, the training set support increases with the Reynolds number. This reveals that global TSS metrics may be treated with caution. 1075Flow, Turbulence and Combustion (2025) 115:1059–1094 5.1.3 Variation of Hill Slope The second variation of the training configuration concerns the hill slope. Again, we focus on two variations which oppositely influence the baseline solution, i.e., � = 0.5 and � = 1.5 . The corresponding mean velocity profiles are shown in Fig. 7d and e, respectively. We note that for steeper hills which yield a longer separation zone, the mixing in the shear layer is even overestimated, which ultimately leads to early reattachment. In spite of that, the velocity profiles still agree well with the reference data almost everywhere. This also manifests in a smaller relative MAE compared to the training case, which implies decent generalization capabilities. Compared to the baseline model, the relative MAE increases by 24 %, though, see Table 2. This is in fact due to the baseline model generalizing much worse to the training than to the test case. Thus, the accuracy gain of the SST model from training to test case is larger than that of the neuralSST model. Further, we conclude that the neuralSST model seems to affect the solution only when the baseline model is inac- curate, which is desirable according to Srivastava and Duraisamy (2021); Beck and Kurz (2021). A more gentle hill slope tends to cause a shorter recirculation zone. For this case, the augmented model slightly underestimates the turbulent mixing in the free shear layer resulting in a late reattachment. Nevertheless, we observe a significant accuracy gain over the baseline model both locally in the velocity profiles shown in Fig. 7 but also globally in the relative MAE of the skin friction coefficient given in Table 2. In fact, the skin friction prediction improves by 27 %, which is even more than for the training case. Varying the size of the recirculation zone by means of the hill slope affects the mini- mum and the mean training set support in the same way as a variation Reynolds num- ber does. The minimum decreases in both directions, whereas the mean strictly decreases towards shallower hills. 5.2 Flow Over Periodic Ribs Due to very limited reference data, we can only assess the predictive accuracy of the neu- ralSST by means of a profile of the main velocity component in the symmetry plane z = 0 sampled along the red dashed line in Fig. 4. The comparison of the numerical predictions and the experimental reference data is depicted in Fig. 10. Fig. 10 Mean axial velocity along y∕H = 0.1 and Nusselt number distribution along y∕H = 0 in the span- wise symmetry plane z = 0 of a flow through a three-dimensional channel with PRs for Reb = 30,000 1076 Flow, Turbulence and Combustion (2025) 115:1059–1094 The reference data suggest that there is a rib-induced separation zone which extends at least halfway towards the subsequent rib. While the SST model fails to predict any reat- tachment, the neuralSST captures the qualitative trend of the reference data much better and only slightly overestimates the size of the recirculation zone. As mentioned in Sec. 4, the linear constitutive law is not capable of reproducing the Reynolds stress-anisotropy driven secondary vortices. These would further enhance the momentum transport and reduce the extension of the separation zone. Nonetheless, the data-driven model leads to a relative MAE improvement of 42 % regarding ux(y∕H = 0.1, z = 0) . Since the results are sampled in close vicinity to the bottom wall, we may regard the curves ux∕ub in Fig. 10 as an approximation for the distribution of the skin friction coefficient2 and, therefore, listed �ux as an approximation for �cf in Table 2. In Fig. 10b, we furthermore display the distribution of the Nusselt number where the wall temperature Tw , the local bulk temperature Tb and the magnitude of the wall-normal temperature gradient at the wall �T∕�n|w = �T|w ⋅ n depend on the sampling location x/H, along the bottom wall. The wall-normal vector n is assumed to have a unit length. The Nusselt numbers are normalized by those for a smooth channel obtained from the Dittus-Boelter correlation (McAdams 1942). As the neuralSST model predicts the intensity of the recirculation more accurately than the SST model, also the Nusselt numbers are in much better agreement with the reference data (20)Nu = − Dh Tw − Tb �T �n |||w, (21)Nu0 = 0.023Re0.8 b Pr0.4 Fig. 11 Local training set support of the neuralSST applied to the flow through a three-dimensional channel with PRs at Reb = 30,000 plotted in the spanwise symmetry plane ( z∕H = 0) and a section plane normal to it ( x∕H = 3.125 ). The green lines indicate the contours q1 = 0.5 separating the regions of high and low non- dimensional strain 2 Due to the no-slip condition, we may approximate the wall-normal velocity gradient at the wall as fol- lows: �u x ∕�y ||w≈ u x ∕y . As this linearization is considered accurate for y+ ≤ 5 and the averaged wall-dis- tance across the sampling line amounts to y+ = 6.5 viscous units, this is an acceptable approximation. 1077Flow, Turbulence and Combustion (2025) 115:1059–1094 in the first half of the domain. Further downstream, the accuracy gain of the Nusselt num- ber prediction decreases, which is a consequence of the imperfect velocity field. Neverthe- less, this leads to a improvement of 55 % in the relative MAE of the Nusselt number across the bottom wall, see Table 2. This test case is similar to the training case regarding its phenomenology and the Reynolds number. Accordingly, we do not expect a significant decrease of the train- ing set support, which is visualized in Fig. 11, compared to the training case. We find virtually the same three regions of missing support as for the training case: the channel center, a region in proximity to the hills’ upper LE and a band along the bottom wall. Interestingly, this is the test case with the highest support on average and the minimum support is also larger than in the training case, see Table  2. The previously observed trend that steeper hill flanks tend to decrease the mean training set support cannot be confirmed. This is likely due to the three-dimensional domain, in which the fraction of poorly supported test samples is comparably lower. 5.3 Plane Channel Flow The accuracy of the PC flow simulation is assessed by means of the axial velocity pro- file along the wall-normal direction, which is depicted in Fig. 12. Note that this is one of the canonical calibration cases for turbulence closure models. One could say, that this is one of the training cases of the SST model. Although CFD-in-the-loop training (Hol- land et al. 2019; Zhao et al. 2020; Novati et al. 2021) was not feasible at the time the SST model was developed, the dynamical behavior of the closed RANS equations was taken into consideration by analytically analyzing and tuning their fixed point solution for equilibrium turbulence in the log-layer (Wilcox 2006). This explains the exception- ally good agreement of the SST model prediction with the reference data (Taghizadeh et al. 2020). The neuralSST model also produces accurate results in the viscous sublayer and the log-layer, but the profile exhibits visible deviations in the buffer layer. This is clearly because the fixed point solution has not been considered when re-calibrating the model for the flow over PHs. Nevertheless, the relative MAEs in both the skin friction coefficient and the Nusselt number are substantially smaller than for all other test cases, see Table 2. Fig. 12 Mean axial velocity (left) and local training set support (right) plotted the over wall normal distance in viscous units for a plane channel flow at Reb = 28,250 . The green line indicates the contour q1 = 0.5 separating the regions of high (left) and low (right) non-dimensional strain 1078 Flow, Turbulence and Combustion (2025) 115:1059–1094 The local training set support shown in Fig.  12 reveals that the model queries are very well supported almost everywhere. The only two exceptions are the viscous sub- layer and the symmetry plane, which is consistent with our observations for the training case. Even though the training case contains only boundary layers with non-vanishing pressure gradients, the ZPG phenomenology does not seem to alter the feature vectors significantly. In that sense, some robustness against phenomenological extrapolation is embedded into the model. As the PC flow may be seen as the limit of the flow over PHs for a vanishing hill slope, we would expect the mean training set support to be smaller than for the shallowest hill under consideration. This is not the case, though. The mini- mum training set support is, however, significantly smaller than for all other test cases. 5.4 Impinging Jet For the IJ, reference data for the mean flow field are available along different vertical lines sampled along the radial direction. These lines are highlighted by red dashed lines in Fig. 5. The corresponding profiles of the velocity magnitude along these lines are depicted in Fig. 13. The first sampling line follows the stagnation streamline and the results of both the SST and the neuralSST model agree well with the experimental data. The remaining lines provide insights on how accurate the wall jet is captured. We clearly see that the neu- ralSST model is more diffusive than its baseline model. However, both numerical predic- tions fall into the range of experimental uncertainty which is shown in gray. We can, there- fore, hardly assess which model is more accurate. In Fig.  13, we also display the radial Nusselt number evolution on the impingement plate. In contrast to the SST model, the data-driven model clearly overpredicts the heat transfer in the stagnation point, it does not Fig. 13 Magnitude of the mean velocity along different wall-normal lines (left) and radial distribution of the Nusselt number (right) for an IJ at Reb = 23,000 with the experimental uncertainty indicated by gray bands Fig. 14 Local training set sup- port of the neuralSST applied to impinging jet at Reb = 23,000 . The green lines indicate the contour q1 = 0.5 separating the regions of high and low non- dimensional strain 1079Flow, Turbulence and Combustion (2025) 115:1059–1094 capture the secondary peak in the wall-jet region and slightly underestimates the Nusselt numbers further downstream. Although the training case contains regions with an adverse pressure gradient (APG), we consider the stagnation point flow around the jet axis as a phenomenon which the model has not seen during its training. The SST model, on the con- trary, contains an eddy viscosity-limiter that was designed to reduce the turbulent momen- tum transfer in strong APG situations. Due to the simple scalar flux model assuming a constant turbulent Prandtl number, this directly translates into a reduction of the turbulent temperature transport. The absence of such an explicit limiter in the data-driven model, explains both observations described above: the deviations in the velocity profiles and the Nusselt numbers. The flow phenomena governing this test case noticeably affect the local training set support as indicated by Fig. 14. The model queries are poorly supported in all three regions of interest: the free shear layer emerging from the nozzle exit, the stagnation zone around the jet axis and the wall jet along the impingement plate. This also manifests in a decrease in the global sup- port metrics, which are given in Table 2, in comparison with channel flows at similar Reyn- olds numbers. These findings further suggest that the flow over PHs is no suitable training case for a model which is supposed to be applied to an IJ. 5.5 Ribbed Two‑Pass Cooling Channel As there are no velocity measurements for the generic two-pass channel available, we focus only on the heat transfer. Hammond et  al. (2022), Fischer et  al. (2023) and Ellis and Xia (2023), however, reported that an augmented momentum closure can significantly increase Fig. 15 Normalized, segment-averaged Nusselt numbers for a ribbed two-pass cooling channel (RMPC) along the suction side (SS), pressure side (PS) and leading/trailing edge (LE/TE) walls (a). Histogram of the relative increase of the absolute error of the area-weighted average of the segment-averaged Nusselt numbers over all walls for the present (orange dashed line) and other parametrizations of the neuralSST model (orange shaded area) as well as a hypothetical model just falling into the experimental uncertainty range (black dotted line) (b) 1080 Flow, Turbulence and Combustion (2025) 115:1059–1094 the accuracy of the heat transfer prediction despite using a very simple scalar flux model. We verified that on the PR test case. To simplify the analysis, we subdivide the walls of the channel into segments. These seg- ments are usually separated by two subsequent ribs. In Fig. 6, the subdivision is illustrated and the segments are assigned numerical labels which we use as a proxy of the streamwise coordinate in the following. The segmentation not only applies to the suction side (SS) which is explicitly shown, but also to the suction side (SS) as well as the walls facing the LE and the trailing edge (TE). In Fig.  15a, we compare the streamwise evolution of the segment-averaged Nusselt numbers with Ω being a segment’s surface and |Ω| the corresponding area, for the four channel walls. We again adopt Eq. (21) for the sake of normalization. As reference temperature for the Nusselt numbers of all segments, we use the bulk temperature at the inlet. In the first segment of the SS, the smooth inlet section, both closure models yield higher heat transfer rates than the experiment. There are two potential reasons for this. First, we simplified the problem by neglecting the upstream parts of the inlet section as implemented in the experimental facility. Although we carefully matched both the bulk Reynolds num- ber and the bulk temperature, the in-plane distributions of momentum and temperature may differ. Secondly, we previously observed that the SST and the neuralSST overestimate the heat transfer in a PC flow by 5 % and 10 %, respectively. As a consequence, the neuralSST model is slightly less accurate in segment 0 on the SS. Between segment 1 and 8, where the ribs are located in the first pass, the SST model underestimates the Nusselt numbers. The data-driven closure, that was trained for channel flows with periodic obstacles, leads to higher Nusselt numbers. Behind the first few ribs, where a quasi-periodic state is reached, the agreement with the reference data is excellent. In the bent region between segment 9 and 12, several phenomena which are not included in the training set play a critical role. In segment 9, the mean flow periodicity is broken as there is no further upstream rib. Then, in segment 10, the primary flow impinges onto the tip wall which leads to a deflection of the mean flow and strong streamline curvature. Further downstream, in segment 11, the deflected flow impinges onto the TE wall resulting in a second deflection into the second pass and towards the outlet. As seen in the previous section, the neuralSST significantly overestimates the Nusselt number in impingement scenarios. We find the same trend in this more complex configuration. Like in the first pass, the neuralSST model predicts higher heat transfer rates in the second pass than its baseline model which is in better agreement with the experimental results. The trends along the PS are very similar. We thus restrict ourselves to emphasizing two noteworthy differences. First, at the end of the first pass, both closure models overestimate the experimental Nusselt numbers by the same margin. In this region, the data-driven cor- rection does, therefore, not benefit from being trained for this particular flow regime. Sec- ondly, the neuralSST model correctly predicts lower heat transfer rates than the SST model in the second pass. This demonstrates that the data-driven augmentations do not yield sys- tematically higher heat transfer rates, but indeed correct for local deficiencies of the base- line model. Despite the promising results across the SS and the PS, the neuralSST model signifi- cantly overestimates the Nusselt numbers along the LE and TE walls. The inclined ribs (22)Nu = 1 |Ω| ∫Ω Nu dA, 1081Flow, Turbulence and Combustion (2025) 115:1059–1094 induce a strong secondary flow that impinges onto the LE wall in the first pass and onto the TE wall in the second pass. This similarity with a single IJ, that we investigated above, explains the difference between the two closure models’ predictions. We summarize these streamwise evolution of the segment-averaged Nusselt numbers along the three walls in a single integral metric: the relative increase of the absolute Nus- selt number error averaged over all segments of all walls and weighted by the correspond- ing surface area wrt. the SST model prediction. As listed in Table 2, the neuralSST leads to an approximately 60 % higher integral error. We also illustrated this in Fig. 15b. The local accuracy gains in the ribbed sections of the first and the second pass are thus overcompen- sated by the local accuracy losses in other parts of the channel. Especially the local effects along the LE and TE walls as well as those in the bend region have a detrimental effect on the integral model performance. In order to test whether this is a coincidence due to the particular model parameteriza- tion, we also tested other versions of the neuralSST model. We varied the training set (vari- ous combinations of PC, square duct flows and flows over PHs), the regression method, the hyperparameters of the regression algorithm and the feature set. We even included nonlinear terms into the momentum closure. For all models that led to convergence for all training cases involved as well as the RMPC, we computed the relative MAE increase in the segment-aver- aged Nusselt numbers and plotted their histogram in Fig. 15b. Even though some models’ integral accuracy is in close agreement with that of the SST model, apparently none of them leads to a net gain. These model versions loose their local advantages over the SST model due to strong regularization, which is either enforced explicitly, e.g., by the constraint in Eq. (10), or implicitly by virtue of a small feature space or competing training cases. However, the mod- els may become arbitrarily inaccurate as they become more complex as a consequence of lack- ing regularization. In fact, they often fail to converge to a stable solution at all. The model ver- sion we present in this paper seems to be an average representative of the class of neuralSST models trained on attached and separated channel flows. Fig. 16 Local training set support of the neuralSST model applied to a RMPC in various section planes per- pendicular to the bulk flow direction. The green line indicates the contour q1 = 0.5 separating the regions of high and low non-dimensional strain 1082 Flow, Turbulence and Combustion (2025) 115:1059–1094 As a final note on the model accuracy, we quantify the maximum integral accuracy gain possible. Taking the experimental uncertainty of � = 8% into account, we must accept any prediction that falls into the range ̃Nu ± � as equally accurate. Given these particular reference data and acknowledging that the SST model’s predictions fall into the experimental uncer- tainty range in most of the segments, we can expect a maximum accuracy gain of only Hence, it is very challenging to perform better than the SST model on this test case. Only detailed measurements of the flow field would help to differentiate potentially more accu- rate models. In Fig.  16, the local training set support is shown on six planes perpendicular to the main flow direction. Like for the other channel flows, we find a lack of support in the center of the smooth inlet section at z∕ds = 1 , where the strain rate vanishes. The first rib, which is cut by the plane z∕ds = 3 , causes a significant change in the model inputs compared to the training set near its leading edge. This is also consistent with the observations made in the PR case. In addition, the secondary flow towards the LE wall, which resembles an IJ, decreases the training set support. Another large region of lacking training set support manifests towards the inner walls of both the first and the second pass. At z∕ds = 9.95 , in close proximity to the tip wall, two prominent streaks along the SS and PS wall are visible. Just like in the wall jet forming in the IJ case, these regions are also characterized by a low non-dimensional strain rate. The same holds for the thin band along the TE wall in the sec- ond pass. Despite the interaction of the different flow phenomena, e.g., flow separation, rib- induced secondary flow, stagnation point regions, strong streamline curvature and wall jets, we find that most of the observations regarding the models predictive accuracy and training set support from the individual cases can be transferred. The mean support is surprisingly the same as for the training case and the the minimum support is even larger. Phenom- enological extrapolation is, therefore, not reliably detected by quantifying the training set support. 6 Discussion In the previous section, we have demonstrated that missing training set support does not imply a poor generalization of the model to different flow phenomena. Although it might contribute, the main driver of generalization errors must be a different one. In the follow- ing, we first identify the root cause of generalization errors and subsequently argue why the generalization capabilities of data-driven EVMs are inherently limited. Finally, we evaluate strategies to extend the spectrum of different phenomena that a model may be successfully applied to. 6.1 Root Cause of Generalization Errors For a given feature space of a model, the mapping between model in- and outputs may not be unique (Zhang and Duraisamy 2015; Sotgiu et  al. 2019; Geneva and Zabaras 2019). (23) �±� Nu �SST Nu − 1 = ∫ Ω ||Ñu(1 ± �) − Ñu||dA ∫ Ω ||Nu SST − Ñu||dA − 1 = � �SST Nu − 1 = −0.2954. 1083Flow, Turbulence and Combustion (2025) 115:1059–1094 Despite the similarity of two feature vectors, the corresponding model responses that are required for an accurate prediction may significantly differ. Liu et al. (2021) provide a thor- ough analysis of this multivalence problem which has two implications for the generaliza- tion capabilities of a model. On the one hand, if it was optimized for one of the two targets, it necessarily performs poorly on the other one. In machine learning, this is known as a data-mismatch problem. That means, the model has been informed with pairs of in- and outputs that do not repre- sent the required mapping for the application scenario. It may be detected by defining the training set support on both a model’s feature and target space. This provides an expla- nation for the observed trend that the generalization errors increase as a model is further tuned for a specific flow phenomenon. On the other hand, if the model was trained to accurately predict both competing targets, the best possible solution is the mean between the two ambiguous solutions. In order to overcome this regularization effect of a diverse training set and resolve all ambiguities, a higher dimensional feature space is required (Liu et  al. 2021; Srivastava and Duraisamy 2021). 6.2 Inherent Generalization Limitations The reasons why expanding the feature space significantly seems to be merely a theoreti- cal option, though, are threefold. First, it is not guaranteed that there are sufficiently many candidate features that comply with all requirements, e.g., Galilean invariance and non- dimensionality, in order to ensure a unique mapping. Although plenty of candidate features can be derived from tensor representation theory, these invariants are typically highly cor- related. For the popular two-dimensional training cases like the flow over PHs, as used in this work, there are indeed only two non-vanishing, independent invariants (Pope 1975). Geneva and Zabaras (2019) and Jiang et al. (2021) furthermore found that a feature set that only contains the invariants of the strain and rotation rate tensors do in general not suffice to ensure a unique mapping. Secondly, the number of training examples is fixed, which increases the chances of overfitting if further features are added to the model (Hastie et al. 2017). Thirdly, a high-dimensional feature space would substantially increase the model’s com- plexity and thus decrease its stability even if unlimited samples were available to prevent overfitting. Melchers et al. (2023) demonstrated that initially small errors grow exponen- tially over the course of a RANS simulation at a rate which is correlated with the model’s complexity. The stability requirements that emerge from the ill-conditioned RANS equa- tions (Wu et al. 2019) thus inherently limit the number of distinguishable flow phenomena and ultimately the generalization capability of data-driven EVM models. This is the reason that closure models which are evaluated many times during a simulation are typically char- acterized by a low-dimensional feature space (Srivastava and Duraisamy 2021). As another consequence, it becomes increasingly challenging to improve EVMs for a particular test case as the number of contained flow phenomena grows. 6.3 Potential Strategies to Improve Generalization Our empirical study has revealed that the linear neuralSST model does only generalize well within the class of flows that are phenomenologically similar to its training case. Although we theoretical derived that the generalization capabilities are inherently limited, we cannot 1084 Flow, Turbulence and Combustion (2025) 115:1059–1094 quantify this limit. Thus, there may be data-driven closure models with variable coeffi- cients that generalize well to a broader spectrum of flows comprising maybe two or a few different phenomena. In the following, we discuss some strategies and principles which may be incorporated for this purpose in future generalization studies. 6.3.1 Nonlinear Constitutive Laws The generalization capability of LEVMs are limited by their structure (Taghizadeh et al. 2020). Thus, they insufficiently describe, for instance, flows with Reynolds stress anisot- ropy-driven secondary flows (Speziale 1987; Pettersson Reif 2006), system rotation (Shih et al. 1995) or strong streamline curvature (Craft et al. 1996). While our study confirms the multivalence of the optimal closure coefficients across different regimes, whether the augmented model outperforms the baseline model also depends on the structure of the for- mer’s constitutive law. In fact, the reference model may perform well on test cases because of error cancellation effects, e.g., the linear closure was designed to compensate for the effects of the missing higher-order closure terms. Considering that nonlinear closure terms are also known to decrease the model’s stability (Apsley et  al. 1997; Bauer et  al. 2000; Weinmann and Sandberg 2009), however, their inclusion into the model may only come at the cost of stronger regularization of the closure coefficients. This may accordingly pre- vent learning the optimal coefficient distributions for the training case and lead to a loss of accuracy. We, therefore, recommend to pursue similar generalization studies for nonlinear closure models. 6.3.2 Interaction of the Closure Coefficients During the development of the neuralSST model, we used the two-step approach towards a priori closure modeling (Mandler and Weigand 2024), i.e., we first extracted the optimal coefficient distributions and then regressed them individually. Hence, it is not guaranteed that the learned coefficient functions interact favorably. While the optimal distributions of the coefficients in the constitutive law and those in the scale equations naturally match, errors in the regression step might lead to inconsistencies. These would, however, manifest themselves in erroneous predictions of flow field for the training case. If the interplay of the coefficients is not considered automatically, for instance, by regressing all coefficients simultaneously (Taghizadeh et al. 2020; Liu et al. 2024), this needs to be done manually. In either way, a meaningful generalization study requires a model that performs well on its training case which necessitates a favorable coefficient interaction. If that is ascertained, we do not expect any major differences in the generalization behavior depending on how this was achieved. 6.3.3 Dynamical System Behavior The present investigation is concerned with the generalization capabilities of models that were optimized in an a priori fashion, i.e., they are supposed to reproduce the high-fidelity RST. Although we expect such models to perform well when they are implemented into a RANS solver, there is no such guarantee (Meneveau 1994a). Even the slightest errors can cause substantial a posteriori errors (Poroseva et al. 2016; Thompson et al. 2016), which 1085Flow, Turbulence and Combustion (2025) 115:1059–1094 is irrespective of their origin, that is whether they stem from the data itself, the inver- sion procedure, or the regression step. So if the behavior of the solver is not automatically accounted for in the training, e.g., by CFD-integrated training (Holland et al. 2019; Zhao et  al. 2020; Novati et  al. 2021), this must be done manually in order to ensure that the model does indeed produce accurate results on the training case. Taking that for grated, though, we think that the same generalization limits apply to models regardless whether they were trained in an a priori or an a posteriori sense. The set of coefficient functions that ensures optimal a posteriori results on one case is most likely detrimental for phenomenologically different flows. This is exemplified by the lim- ited generalization capabilities of traditional closure models. Most of them were calibrated such that they accurately predict the decay rate of homogeneous, isotropic turbulence, the spreading rate of free shear layers, the log-law in equilibrium boundary layers and the propagation speed of the edge of the turbulent region (Pope 2000). However, many of those models fail, for instance, to accurately predict separating and reattaching boundary layers (Jakirlic 2012). So it appears that the coefficients required for partially separated boundary layers differ from those considered optimal for the calibration cases. 6.3.4 Enforcing Traditional Calibration Constraints Spalart (2023) demands that data-driven models must not perform worse than the afore- mentioned traditional calibration cases as the classical models do. Applying the same con- straints on the closure coefficients (Taghizadeh et al. 2020; Liu et al. 2024) would ensure this and, on the one hand, potentially enhance the model’s generalization capability. On the other hand, the effectively limited capacity of the model may not suffice to learn the opti- mal coefficient functions from the training case while adhering to the most likely conflict- ing constraints. This would be a consequence of the widely accepted optimal coefficients’ non-uniqueness. Consequently, such a procedure could lead to the same strong, unfavorable regularization effect that was observed in multicase training (Geneva and Zabaras 2019; Lav et  al. 2019; Hammond et  al. 2020; Waschkowski et  al. 2022; Lav et  al. 2023; Fang et al. 2023), which ultimately reduces the model’s accuracy on its actual training case. Since the closure coefficients are allowed to vary across the domain, it furthermore remains to be clarified, how it can be assured that those constraints are only applied in their respective regime of validity. In constrast to models with constant coefficients, the predicted coefficients in an equilibrium boundary layer, for example, may not necessar- ily comply with the requirements of homogeneous, isotropic turbulence and vice versa. Enforcing these constraints everywhere, though, would thus overly regularize the model. 6.3.5 Zonal and Mixture Models When we accept the non-existence of a single stable model that reproduces the optimal clo- sure coefficients for many different flow phenomena, we might use a combination of mul- tiple models where each of them is applied to its regime of validity. This idea of zonal and mixture models, see Eq.  (14), was first introduced in traditional closure modeling (Rodi 1991; Menter 1994b) and has recently been adopted to data-driven modeling. Each expert model can, therefore, be trained for one specific phenomenon, which has been shown to work very well. The compositional model does consequently not suffer from the regulari- zation effect of competing training objectives. Although promising results were achieved 1086 Flow, Turbulence and Combustion (2025) 115:1059–1094 on simple test cases, the demonstration of superior generalization capabilities of mixture models in complex applications is still pending. To this end, several questions are in order. First, based on which rationale should we split the regimes for which expert models are trained? As the different expert models should resolve the ambiguity of the optimal closure coefficients, the activations should be functions of both the model’s feature space and its corresponding target variable. As ground truth information for the latter are, however, not available for arbitrary test cases, this can only be approximated. Haghiri et al. (2020) and Zhao et al. (2020) thus propose to divide the domain in the physical space based on a priori knowledge about the occurring flow phenomena. Weatheritt and Sandberg (2016); Akole- kar et al. (2018); Lav et al. (2019); Geneva and Zabaras (2019); Zhao et al. (2020); Akole- kar et al. (2021); Frey Marioni et al. (2021) and Man et al. (2023) use marker functions for specific phenomena that automatically detect those regions. In contrast to that, Sriv- astava and Duraisamy (2021), apply a manual discretization to the model’s feature space. Likewise, Edeling et al. (2014); Lozano-Durán and Bae (2023); Cherroud et al. (2023) and de Zordo-Banliat et al. (2024) automatically discretize the feature space by computing the similarity of a test sample with the set of training samples. This is in close analogy to the proposal of Parish and Duraisamy (2016); Wu et al. (2018); Srivastava and Duraisamy (2021); Kurz and Beck (2021) and Frey Marioni et al. (2021) to make each model’s con- tribution to the prediction dependent on its local extrapolation extent, which is ultimately a result of the chosen feature space. Secondly, how many different expert models should the mixture model comprise? The answer to this question is clearly linked to how many different functions are required to resolve all or at least the majority of ambiguities in the optimal closure coefficient dis- tributions on the test case. While this may approximated by the number of different flow phenomena or regimes that we expect, it is questionable whether such a discretization is actually applicable as there continuous transitions from one into another regime. Thirdly, how does the potentially drastic variation of the closure model definition across the domain affects its stability? Ling (2015) and Lav et al. (2019) emphasize the impor- tance of smooth blending or activation functions. Loosely speaking, it remains question- able, though, how smooth these functions have to be. 7 Conclusion In the present paper, we explain and illustrate the inherently limited generalization capabili- ties of data-driven EVMs. To this end, we performed a systematic generalization study with the neuralSST model, a data-driven LEVM trained for two-dimensional channel flows with boundary layer separation and reattachment. This model outperforms the SST model on other phenomenologically similar flows irrespective of the Reynolds number and the specific shape of the separation-inducing obstacle. While it also works well for attached channel flows, it generalizes poorly to stagnation flows and flows with strong streamline curvature. Applying this model to a ribbed two-pass cooling channel, which is governed by many different flow regimes, reveals that the local accuracy gains in the regions which are similar to the training case are overcompensated by the local losses elsewhere. Generalization to different flow phe- nomena is apparently significantly more challenging than generalization to different Reynolds numbers or geometries. We demonstrated that this is not necessarily due to lacking support by training samples with a similar feature vector, but mainly a consequence of the different 1087Flow, Turbulence and Combustion (2025) 115:1059–1094 model response being required. This is known as a data-mismatch problem and arises from the specific choice of the feature set and the training case. Optimizing a model for a specific class of flows, therefore, decreases its generalization capabilities. Optimizing a model such that it accurately predicts a wide spectrum of flow regimes simultaneously, however, would require highly complex models suffering from severe instabilities. Our findings support existing con- cerns that a universal EVM exists (Durbin and Pettersson Reif 2011). One should rather opt for niche models tailored for specific phenomena (Frey Marioni et al. 2021; Rumsey and Cole- man 2022; Cherroud et  al. 2022). When optimizing models for complex multi-phenomena applications, this implies that the different flow regimes need to be prioritized and one should not expect the same level of global accuracy gains as often reported for academic test cases. We have demonstrated that such a prioritization may, on the one hand, be realized by virtue of selecting the training data only from a specific flow phenomenon. On the other hand, CFD- driven training allows to embed this prioritization by defining a specific quantity of interest in a particular region of the domain as optimization target. Aiming for data-driven models that perform well in a wide spectrum of applications, we believe that zonal and mixture models are the most promising approach. Appendix A SST model In this appendix, we briefly recall the formulation of the SST model (Menter et al. 2003). As a LEVM, the constitutive law for the specific RST reads where the eddy viscosity is defined as and k and � are determined by solving the subsequent set of scale equations. These contain the cross-diffusion term and a production limiter The blending functions F1 and F2 are defined as (A1)−u�u� = − 2 3 kI + 2�SST t S, (A2)�SST t = a1k max � a1�, √ 2F2 ���S ��� � (A3)RSST c = ( Dt(k) − P∗ k + �∗k� − � ⋅ [( � + �k� SST t ) �k ] Dt(�) − �P∗ k �SST t + ��2 − � ⋅ [( � + ��� SST t ) �� ] − (1 − F1)CD ) = 0 (A4)CD = 2�k−� � �k ⋅ �� � (A5)P∗ k = min ( 2�SST t ‖‖‖S ‖‖‖ 2 , 10�∗k� ) . (A6a)F1 = tanh ( arg4 1 ) 1088 Flow, Turbulence and Combustion (2025) 115:1059–1094 with their respective arguments Apart from the two constants �∗ = 0.09 and a1 = 0.31 , the remaining coefficients g ∈ { �, � , �k, �� } are continuously blended between their respective values in the k-� and k-� models, which are given in Table 3. We may thus interpret the SST model as a mixture model with continuous activation func- tions which were derived from physical reasoning or as a model with variable closure coef- ficients. By comparing Eqs. (A1) and (2), we find that the eddy viscosity limiter defined by Eq. (A2) in fact implies a variable closure coefficient using the feature set QSST = { Ret, arg2, � ‖‖‖S ‖‖‖ } . Acknowledgements The investigations were conducted as part of the joint research program Roboflex in the frame of AG Turbo. The work was supported by the Bundesministerium für Wirtschaft und Energie (BMWE) as per resolution of the German Federal Parliament under grant number 03EE5013C. The authors gratefully acknowledge AG Turbo and MTU Aero Engines AG for their support and the permission to pub- lish this paper. In particular, the authors would like to acknowledge the fruitful discussions with Mr. Pin- chaud (MTU). Additionally, the authors kindly acknowledge the cooperation with the Cluster of Excellence SimTech (project number: 390740016). The responsibility for the content lies solely with its authors. Author Contributions Conceptualization: [Hannes Mandler, Bernhard Weigand]; Methodology: [Hannes Mandler]; Formal analysis and investigation: [Hannes Mandler]; Writing - original draft preparation: [Hannes Mandler]; Writing - review and editing: [Bernhard Weigand]; Funding acquisition: [Bernhard Wei- gand]; Supervision: [Bernhard Weigand] Funding Open Access funding enabled and organized by Projekt DEAL. Open Access funding enabled and organized by Projekt DEAL. Data Availability The data that support the findings of this study are available from the corresponding author upon reasonable request (A6b)F2 = tanh ( arg2 2 ) (A7)arg1 = min � min � max � √ k �∗�y , 500� y2� � , 4�k−� � k max � CD, 10−10 � y2 � , 10 � (A8)arg2 = min � max � 2 √ k �∗�y , 500� y2� � , 100 � . (A9)g = F1g k-� + (1 − F1)g k-� (A10) gSST 1 = 2�∗ max � max � 1, 1.8Re − 1 2 t � , √ 2�∗ a1 tanh � arg2 2 � � ���S ��� � ≤ 2�∗ Table 3 Closure coefficients of the k-� and k-� models Model � � �k �� k-� 0.0750 0.5556 0.85 0.500 k-� 0.0828 0.44 1.00 0.856 1089Flow, Turbulence and Combustion (2025) 115:1059–1094 Declarations Conflict of interest The authors have no Conflict of interest to declare that are relevant to the content of this article. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Com- mons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. References Apsley, D., Chen, W.-L., Leschziner, M., Lien, F.-S.: Non-linear eddy-viscosity modelling of separated flows. J. Hydraul. Res. 35(6), 723–748 (1997). https:// doi. org/ 10. 1080/ 00221 68970 94983 86 Amarloo, A., Forooghi, P., Abkar, M.: Frozen propagation of Reynolds force vector from high-fidelity data into Reynolds-averaged simulations of secondary flows. Phys. Fluids 34(11), 115102 (2022). https:// doi. org/ 10. 1063/5. 01232 31 Abe, H., Kawamura, H., Matsuo, Y.: Direct numerical simulation of a fully developed turbulent channel flow with respect to the Reynolds number dependence. J. Fluids Eng. 123(2), 382–393 (2001). https:// doi. org/ 10. 1115/1. 13666 80 Alfonsi, G.: Reynolds-averaged Navier-Stokes equations for turbulence modeling. Appl. Mech. Rev. 62(4), 040802 (2009). https:// doi. org/ 10. 1115/1. 31246 48 Akolekar, H.D., Weatheritt, J., Hutchins, N., Sandberg, R.D., Laskowski, G., Michelassi, V.: Development and use of machine-learnt algebraic reynolds stress models for enhanced prediction of wake mixing in LPTs. Turbo Expo: Power for Land, Sea, and Air, vol. Volume 2C: Turbomachinery, pp. 02–42009 (2018). https:// doi. org/ 10. 1115/ GT2018- 75447 Akolekar, H.D., Zhao, Y., Sandberg, R.D., Pacciani, R.: Integration of machine learning and computational fluid dynamics to develop turbulence models for improved low-pressure turbine wake mixing predic- tion. J. Turbomach. 143(12), 121001 (2021). https:// doi. org/ 10. 1115/1. 40514 17 Bush, R.H., Chyczewski, T.S., Duraisamy, K., Eisfeld, B., Rumsey, C.L., Smith, B.R.: Recommendations for future efforts in RANS modeling and simulation. In: AIAA Scitech 2019 Forum. https:// doi. org/ 10. 2514/6. 2019- 0317 Bauer, W., Haag, O., Hennecke, D.K.: Accuracy and robustness of nonlinear eddy viscosity models. Int. J. Heat Fluid Flow 21(3), 312–319 (2000). https:// doi. org/ 10. 1016/ S0142- 727X(00) 00015-1 Basara, B., Jakirlic, S.: A new hybrid turbulence modelling strategy for industrial CFD. Int. J. Numer. Meth- ods Fluids 42, 89–116 (2003) Beck, A., Kurz, M.: A perspective on machine learning methods in turbulence modeling. GAMM-Mitteilun- gen 44(1), 202100002 (2021). https:// doi. org/ 10. 1002/ gamm. 20210 0002 Boussinesq, J.: Essai sur la théorie des eaux courantes. Mémoires présentés par divers savants à l’Académie des Sciences XXIII, 1–680 (1877) Breuer, M., Peller, N., Rapp, C., Manhart, M.: Flow over periodic hills—numerical and experimental study in a wide range of Reynolds numbers. Comput. Fluids 238(2), 433–457 (2009). https:// doi. org/ 10. 1016/j. compfl uid. 2008. 05. 002 Baughn, J.W., Shimizu, S.: Heat transfer measurements from a surface with uniform heat flux and an impinging jet. J. Heat Transfer 111(4), 1096–1098 (1989). https:// doi. org/ 10. 1115/1. 32507 76 Cooper, D., Jackson, D.C., Launder, B.E., Liao, G.X.: Impinging jet studies for turbulence model assess- ment-I. Flow-field experiments. Int. J. Heat Mass Transfer 36(10), 2675–2684 (1993). https:// doi. org/ 10. 1016/ S0017- 9310(05) 80204-2 Craft, T.J., Launder, B.E., Suga, K.: Development and application of a cubic eddy-viscosity model of turbu- lence. Int. J. Heat Fluid Flow 17(2), 108–115 (1996). https:// doi. org/ 10. 1016/ 0142- 727X(95) 00079-6 http://creativecommons.org/licenses/by/4.0/ https://doi.org/10.1080/00221689709498386 https://doi.org/10.1063/5.0123231 https://doi.org/10.1063/5.0123231 https://doi.org/10.1115/1.1366680 https://doi.org/10.1115/1.1366680 https://doi.org/10.1115/1.3124648 https://doi.org/10.1115/GT2018-75447 https://doi.org/10.1115/1.4051417 https://doi.org/10.2514/6.2019-0317 https://doi.org/10.2514/6.2019-0317 https://doi.org/10.1016/S0142-727X(00)00015-1 https://doi.org/10.1002/gamm.202100002 https://doi.org/10.1016/j.compfluid.2008.05.002 https://doi.org/10.1016/j.compfluid.2008.05.002 https://doi.org/10.1115/1.3250776 https://doi.org/10.1016/S0017-9310(05)80204-2 https://doi.org/10.1016/S0017-9310(05)80204-2 https://doi.org/10.1016/0142-727X(95)00079-6 1090 Flow, Turbulence and Combustion (2025) 115:1059–1094 Cherroud, S., Merle, X., Cinnella, P., Gloerfelt, X.: Sparse Bayesian learning of explicit algebraic Reynolds- stress models for turbulent separated flows. Int. J. Heat Fluid Flow 98, 109047 (2022). https:// doi. org/ 10. 1016/j. ijhea tflui dflow. 2022. 109047 Cherroud, S., Merle, X., Cinnella, P., Gloerfelt, X.: Space-dependent aggregation of data-driven turbulence models (2023). arXiv: 2306. 16996 v1 de Zordo-Banliat, M., Dergham, G., Merle, X., Cinnella, P.: Space-dependent turbulence model aggrega- tion using machine learning. J. Comp. Phys. 497, 112628 (2024). https:// doi. org/ 10. 1016/j. jcp. 2023. 112628 Dietz, C.F., Neumann, S.O., Weigand, B.: A comparative study of the performance of explicit algebraic models for the turbulent heat flux. Numer. Heat Transf. Part A Appl. 52(2), 101–126 (2007). https:// doi. org/ 10. 1080/ 10407 78060 11150 46 Durbin, P.A., Pettersson Reif, B.A.: Statistical Theory and Modeling for Turbulent Flows, 2nd edn. John Wiley & Sons, Ltd, Chichester (2011). https:// doi. org/ 10. 1002/ 97804 70972 076 Doormaal, J.P.V., Raithby, G.D.: Enhancements of the simple method for predicting incompressible fluid flows. Numer. Heat Transf. 7(2), 147–163 (1984). https:// doi. org/ 10. 1080/ 01495 72840 89618 17 Durbin, P.A.: Near-wall turbulence closure modeling without “damping functions’’. Theor. Comput. Fluid Dyn. 3(1), 1–13 (1991). https:// doi. org/ 10. 1007/ BF002 71513 Duraisamy, K.: Perspectives on machine learning-augmented Reynolds-averaged and large eddy simulation models of turbulence. Phys. Rev. Fluids 6, 050504 (2021). https:// doi. org/ 10. 1103/ PhysR evFlu ids.6. 050504 Edeling, W.N., Cinnella, P., Dwight, R.P.: Predictive RANS simulations via Bayesian model-scenario aver- aging. J. Comp. Phys. 275, 65–91 (2014). https:// doi. org/ 10. 1016/j. jcp. 2014. 06. 052 Elfert, M., Schroll, M., Förster, W.: PIV measurement of secondary flow in a rotating two-pass cooling sys- tem with an improved sequencer technique. J. Turbomach. 134(3), 031001 (2011). https:// doi. org/ 10. 1115/1. 40032 22 Ellis, C.D., Xia, H.: Data-driven turbulence anisotropy in film and effusion cooling flows. Phys. Fluids 35(10), 105114 (2023). https:// doi. org/ 10. 1063/5. 01666 85 Fischer, L., James, D., Jeyaseelan, S., Pfitzner, M.: Optimizing the trench shaped film cooling design. Int J. Heat Mass Transf. 214, 124399 (2023). https:// doi. org/ 10. 1016/j. ijhea tmass trans fer. 2023. 124399 Fiore, M., Koloszar, L., Fare, C., Mendez, M.A., Duponcheel, M., Bartosiewicz, Y.: Physics-constrained machine learning for thermal turbulence modelling at low Prandtl numbers. Int. J. Heat Mass Transf. 194, 122998 (2022). https:// doi. org/ 10. 1016/j. ijhea tmass trans fer. 2022. 122998 Frey Marioni, Y., Toledo Ortiz, E.A., Cassinelli, A., Montomoli, F., Adami, P., Vazquez, R.: A machine learning approach to improve turbulence modelling from DNS data using neural networks. Int. J. Turbomach. Propul. Power 6(2), 17 (2021). https:// doi. org/ 10. 3390/ ijtpp 60200 17 Fang, Y., Zhao, Y., Waschkowski, F., Ooi, A.S.H., Sandberg, R.D.: Toward more general turbulence models via multicase computational-fluid-dynamics-driven training. AIAA J. 61(5), 2100–2115 (2023). https:// doi. org/ 10. 2514/1. J0625 72 Glushko, G.S.: Turbulent boundary layer on a flat plate in an incompressible fluid. NASA TT F-10080, translation of "Turbulentnyy pogranichnyy sloy na ploskoy plastine v neszhimayemoy zhidkosti". Izvestiya Akademii Nauk SSSR, Seriya Mekhanika 4, 13-23 (1965) Geneva, N., Zabaras, N.: Quantifying model form uncertainty in Reynolds-averaged turbulence models with Bayesian deep neural networks. J. Comput. Phys. 383, 125–147 (2019). https:// doi. org/ 10. 1016/j. jcp. 2019. 01. 021 Han, J.-C.: Recent studies in turbine blade cooling. Int. J. Rot. Mach. 10(6), 517231 (2004). https:// doi. org/ 10. 1155/ S1023 621X0 40004 42 Holland, J.R., Baeder, J.D., Duraisamy, K.: Towards integrated field inversion and machine learning with embedded neural networks for RANS modeling. AIAA Scitech 2019 Forum, San Diego, Cal- ifornia, United States (2019). https:// doi. org/ 10. 2514/6. 2019- 1884. https:// arc. aiaa. org/ doi/ abs/ 10. 2514/6. 2019- 1884 Huijing, J.P., Dwight, R.P., Schmelzer, M.: Data-driven RANS closures for three-dimensional flows around bluff bodies. Comput. Fluids 225, 104997 (2021). https:// doi. org/ 10. 1016/j. compfl uid. 2021. 104997 Hammond, J., Frey Marioni, Y., Montomoli, F.: Error quantification for the assessment of data- driven turbulence models. Flow Turbul. Combust. 109(1), 1–26 (2022). https:// doi. org/ 10. 1007/ s10494- 022- 00321-1 Haghiri, A., Lav, C., Sandberg, R.: Data-driven turbulence modelling for improved prediction of ship Airwakes. In: 33rd Symposium on Naval Hydrodynamics, Osaka, Japan (2020) Hammond, J., Montomoli, F., Pietropaoli, M., Sandberg, R.D., Michelassi, V.: Machine Learning for the Development of Data Driven Turbulence Closures in Coolant Systems. Turbo Expo: Power for https://doi.org/10.1016/j.ijheatfluidflow.2022.109047 https://doi.org/10.1016/j.ijheatfluidflow.2022.109047 http://arxiv.org/abs/2306.16996v1 https://doi.org/10.1016/j.jcp.2023.112628 https://doi.org/10.1016/j.jcp.2023.112628 https://doi.org/10.1080/10407780601115046 https://doi.org/10.1080/10407780601115046 https://doi.org/10.1002/9780470972076 https://doi.org/10.1080/01495728408961817 https://doi.org/10.1007/BF00271513 https://doi.org/10.1103/PhysRevFluids.6.050504 https://doi.org/10.1103/PhysRevFluids.6.050504 https://doi.org/10.1016/j.jcp.2014.06.052 https://doi.org/10.1115/1.4003222 https://doi.org/10.1115/1.4003222 https://doi.org/10.1063/5.0166685 https://doi.org/10.1016/j.ijheatmasstransfer.2023.124399 https://doi.org/10.1016/j.ijheatmasstransfer.2022.122998 https://doi.org/10.3390/ijtpp6020017 https://doi.org/10.2514/1.J062572 https://doi.org/10.1016/j.jcp.2019.01.021 https://doi.org/10.1016/j.jcp.2019.01.021 https://doi.org/10.1155/S1023621X04000442 https://doi.org/10.1155/S1023621X04000442 https://doi.org/10.2514/6.2019-1884 https://arc.aiaa.org/doi/abs/10.2514/6.2019-1884 https://arc.aiaa.org/doi/abs/10.2514/6.2019-1884 https://doi.org/10.1016/j.compfluid.2021.104997 https://doi.org/10.1016/j.compfluid.2021.104997 https://doi.org/10.1007/s10494-022-00321-1 https://doi.org/10.1007/s10494-022-00321-1 1091Flow, Turbulence and Combustion (2025) 115:1059–1094 Land, Sea, and Air, vol. Volume 7A: Heat Transfer, pp. 07–15026 (2020). https:// doi. org/ 10. 1115/ GT2020- 15928 Hammond, J., Pietropaoli, M., Montomoli, F.: Robust data-driven turbulence closures for improved heat transfer prediction in complex geometries. Int. J. Heat Fluid Flow 98, 109072 (2022). https:// doi. org/ 10. 1016/j. ijhea tflui dflow. 2022. 109072 Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning, 2nd edn. Springer, New York (2017). https:// doi. org/ 10. 1007/ 978-0- 387- 84858-7 Illowsky, B., Dean, S.: Introductory Statistics. OpenStax, Houston (2018) Jakirlic, S.: Extended excerpt related to the test case: Flow over a periodical arrangement of 2D hills. Technical Report ACP8-GA-2009-233710-ATAAC, TU Darmstadt. Final report on "Assessment of the RSM, URANS and hybrid models with respect to the different roadmaps including the industrial application challenges" (2012) Jiang, C., Vinuesa, R., Chen, R., Mi, J., Laima, S., Li, H.: An interpretable framework of data-driven turbulence modeling using deep neural networks. Phys. Fluids 33(5), 055133 (2021). https:// doi. org/ 10. 1063/5. 00489 09 Kurz, M., Beck, A.: Investigating model-data inconsistency in data-informed turbulence closure terms. WCCM-ECCOMAS2020, virtual (2021). https:// doi. org/ 10. 23967/ wccm- eccom as. 2020. 115 Kaandorp, M.L.A., Dwight, R.P.: Data-driven modelling of the Reynolds stress tensor using random forests with invariance. Comp. Fluids 202, 104497 (2020). https:// doi. org/ 10. 1016/j. compfl uid. 2020. 104497 Lav, C., Banko, A.J., Waschkowski, F., Zhao, Y., Elkins, C.J., Eaton, J.K., Sandberg, R.D.: A coupled framework for symbolic turbulence models from deep-learning. Int. J. Heat Fluid Flow 101, 109140 (2023). https:// doi. org/ 10. 1016/j. ijhea tflui dflow. 2023. 109140 Lozano-Durán, A., Bae, H.J.: Machine learning b