Vol.:(0123456789)

Flow, Turbulence and Combustion (2025) 115:1059–1094
https://doi.org/10.1007/s10494-024-00595-7

RESEARCH

Generalization Limits of Data‑Driven Turbulence Models

Hannes Mandler1 · Bernhard Weigand1

Received: 22 July 2024 / Accepted: 12 October 2024 / Published online: 9 November 2024 
© The Author(s) 2024

Abstract
Many industrial applications require turbulent closure models that yield accurate predic-
tions across a wide spectrum of flow regimes. In this study, we investigate how data-driven 
augmentations of popular eddy viscosity models affect their generalization properties. We 
perform a systematic generalization study with a particular closure model that was trained 
for a single flow regime. We systematically increase the complexity of the test cases up to 
an industrial application governed by a multitude of flow patterns and thereby demonstrate 
that tailoring a model to a specific flow phenomenon decreases its generalization capa-
bility. In fact, the accuracy gain in regions that the model was explicitly calibrated for is 
smaller than the loss elsewhere. We furthermore show that extrapolation or, generally, a 
lack of training samples with a similar feature vector is not the main reason for generaliza-
tion errors. There is actually only a weak correlation. Accordingly, generalization errors 
are probably due to a data-mismatch, i.e., a systematic difference in the mappings from 
the model inputs to the required responses. More diverse training sets unlikely provide a 
remedy due to the strict stability requirements emerging from the ill-conditioned RANS 
equations. The universality of data-driven eddy viscosity models with variable coefficients 
is, therefore, inherently limited.

Keywords Data-driven turbulence modeling · Machine learning · Extrapolation · 
Generalizability · NeuralSST · Ribbed two-pass cooling channel

1 Introduction

Turbulent flows play an important role in many industrial applications. In order to speed-
up the design cycles and reduce the development cost, CFD simulations become increas-
ingly more important (Slotnick et al. 2014; Bush et al. 2019), especially those solving the 
Reynolds-averaged Navier–Stokes (RANS) equations (Slotnick et  al. 2014; Rumsey and 
Coleman 2022). For non-reacting, incompressible, single-phase flows they read 

 * Hannes Mandler 
 hannes.mandler@itlr.uni-stuttgart.de

1 Institute of Aerospace Thermodynamics, University of Stuttgart, Pfaffenwaldring 31, 
70569 Stuttgart, Germany

http://crossmark.crossref.org/dialog/?doi=10.1007/s10494-024-00595-7&domain=pdf


1060 Flow, Turbulence and Combustion (2025) 115:1059–1094

 and describe the spatio-temporal evolution of the mean velocity u , the mean pressure 
p
∗
= p∕� and the mean temperature T  . The operator Dt = �t +

(
u ⋅ �

)
 denotes the mean 

substantial derivative. As the density � , the kinematic viscosity � and the Prandtl number 
Pr are assumed to be constant, the temperature can be treated as a passive scalar.

The (specific) Reynolds stress tensor (RST) −u�u� and the (specific) scalar flux vector 
(SFV) −u�T � , that occur in Eq. (1b) and Eq. (1c), respectively, require a closure model. 
The most popular class of closures in industrial applications are linear eddy viscosity 
(LEVM) (Slotnick et al. 2014) and diffusivity models as they provide a reasonable bal-
ance between numerical stability, computational efficiency and accuracy. The corre-
sponding constitutive laws

assume linear relationships between the anisotropic part of the RST and the mean strain rate 
tensor S (Boussinesq 1877) as well as the SFV and the mean temperature gradient, respec-
tively. In Eqs. (2) and (3), we substituted the eddy viscosity �t by the product of the tur-
bulent kinetic energy (TKE) k, the turbulent time scale � = �(k, � ) and a non-dimensional 
coefficient g1 , where � is an auxiliary scale-providing variable, for instance, the turbulent 
dissipation rate � or frequency � . The turbulent Prandtl number Prt in principle depends on 
the local flow regime and is a user-defined parameter. The turbulent scales ZC = [k, � ]T are 
obtained from the solution of usually two model-specific transport equations

which in general also depend on the primary model variables ZM =
[
u, p

]T . Like the con-
stitutive law given by Eq.  (2), these transport equations contain some non-dimensional 
coefficients gi ∈ G which require calibration.

Such linear eddy viscosity models (LEVMs) often lack the desired accuracy because 
they suffer from several structural deficiencies. Besides the locality of Eq.  (2) and its 
linearity in the mean strain rate tensor (Pope 1975; Rodi 1976; Schmitt 2007), its spe-
cific parametrization by means of G can cause substantial errors. Traditionally, the 
model coefficients were treated as a set of universal constants which were calibrated for 
very simple flows as homogeneous, isotropic turbulence, zero-pressure gradient (ZPG) 
boundary or free-shear layers (Wilcox 2006). Yet, the optimal parameter values depend 
on the flow phenomena under investigation (Alfonsi 2009; Yarlanki et  al. 2012; Ray 
et al.; Li et al. 2023). Because the complex geometries in real-world applications often 

(1a)0 = � ⋅ u

(1b)Dt

(
u
)
= � ⋅

(
2𝜈S − u�u�

)
− �p̄∗

(1c)Dt

(
T
)
= � ⋅

(
�
Pr

�T − u�T �

)

(2)−u�u� = −
2

3
kI + g1k�S,

(3)−u�T � = −
g1k�

2Prt
�T .

(4)Rc

(
ZM ,ZC ∣ G

)
= 0,


1061Flow, Turbulence and Combustion (2025) 115:1059–1094 

induce a plethora of different flow patterns, e.g., flow separation, strong streamline cur-
vature or swirl, a set of constant coefficients is in general inappropriate.

Hence, Glushko (1966); Pope (1975) and Launder et  al. (1977), among others, 
proposed models with variable coefficients depending on the mean flow field, i.e., 
G = G

(
ZM ,ZC

)
 . The emerging discipline of data-driven closure modeling builds upon 

these early works leveraging modern machine learning techniques in order to exploit the 
continuously growing amount of (publicly) available calibration data. Weatheritt and 
Sandberg (2017); Wu et al. (2018); Schmelzer et al. (2020); Jiang et al. (2021); Ham-
mond et al. (2022); Mandler and Weigand (2022b); Fiore et al. (2022); Yin et al. (2022); 
Miró et al. (2023) and many others demonstrated the superior accuracy of data-driven 
models when they are applied to the same class of flows that they have explicitly been 
trained for.

But just as for the traditional closures, the same question of generalization to off-cal-
ibration scenarios arises. Accurate predictions in flow regimes which the model was not 
optimized for would require the same mappings from the chosen feature space to the 
model coefficients (Tracey et al. 2013; Milani et al. 2017; Zhao et al. 2020; Srivastava and 
Duraisamy 2021; Rumsey et al. 2022), but according to Lav et al. (2019) and Fang et al. 
(2023) this does not even hold true for boundary and different kind of free-shear layers.

Overcoming this multivalence (Zhang and Duraisamy 2015; Liu et  al. 2021) or non-
uniqueness problem (Sotgiu et al. 2019; Jiang et al. 2021; Srivastava and Duraisamy 2021) 
and striving for universality is challenging. In fact, there is serious doubt that a universal 
model exists (Geneva and Zabaras 2019; Taghizadeh et al. 2020; Rumsey et al. 2022; Fang 
et  al. 2023; Lav et  al. 2023). Mixture models (Cherroud et  al. 2023; Lozano-Durán and 
Bae 2023), which are based on an ensemble of expert models each optimized for a specific 
regime, may thus offer an appealing alternative. This old idea of using a model only where 
it is supposed to work (Wu et al. 2018) has, for instance, led to the SST model (Menter 
1994b). However, this raises the question of how many different phenomena can or should 
be distinguished.

In this paper, we investigate whether LEVMs with variable coefficients can improve 
predictions in complex, multi-phenomena applications if they were optimized for only a 
single but prominent regime. As popular LEVMs with constant coefficients lack accuracy 
for many flow patterns, there is definitely a potential for local and thus global enhance-
ments. The central question is, whether this potential can be exploited by increasing a 
baseline model’s complexity and thereby tailoring it to a more relevant phenomenon. We, 
therefore, address the trade-off between optimizing a model for a single flow regime and 
the generalization capability to be expected.

To this end, we perform a systematic generalization study for a particular data-driven 
closure that was trained on the commonly used two-dimensional flow over periodic hills. 
After demonstrating that this model does indeed improve the flow field predictions for the 
training case, we first vary the Reynolds number and the geometry of the test case with-
out qualitatively altering the ground truth flow field. Subsequently, we test the model on a 
three-dimensional channel with periodic ribs, which is phenomenologically similar to the 
training case. We then apply the model to off-calibration cases like a plane channel flow 
and an impinging jet. Finally, we assess the model’s accuracy on a ribbed, two-pass cool-
ing channel, which resembles all of the aforementioned test cases in some regions of the 
domain.

This careful analysis reveals that data-driven closures are applicable to different geom-
etries and Reynolds number as long as the expected flow fields of the training and test 
case are qualitatively similar. Tailoring a model towards a specific flow regime, though, 


1062 Flow, Turbulence and Combustion (2025) 115:1059–1094

seems to amplify its generalization error in other regimes. Despite local accuracy gains in 
regions of the ribbed two-pass channel that resemble the flow over periodic hills, we conse-
quently observe a global accuracy loss compared with the SST model. We conclude that an 
increasing level of specificity adversely affects generality.

Furthermore, we investigate the potential causes of the observed generalization errors, 
in particular the similarity of the model inputs in the training and test set. Extrapolation 
is a well-known special case of dissimilarity between test and training samples, i.e., lack-
ing support of a test sample by similar training samples in the feature space, and widely 
accepted as a potential source of error. We thus quantify the local lack of training set sup-
port for all test cases. However, our results suggest that there is no consistent correlation 
between the training set support in the feature space and the models’ predictive accuracy. 
We indeed observe regions of both high and low a posteriori accuracy which are altogether 
characterized by missing training set support. We, therefore, argue that the lack of training 
samples with similar feature vectors is no reliable indicator for generalization errors but 
only for the prediction confidence. Accordingly, generalization errors are most likely due to 
the fact that different flow regimes may be characterized by similar model inputs but never-
theless require ambiguous responses.

The remainder of this paper is structured as follows. In Sect. 2, we outline the structure 
of the data-driven closure under investigation including a description of the training set. In 
Sect. 3, we explain the difference between extrapolation and the lack of training set sup-
port in the model’s feature space in general. We additionally provide an exact as well as an 
approximate quantification metric for the latter. Subsequently, we present the test cases and 
the corresponding results in Sects. 4 and 5, respectively. We close with a discussion how 
the stability requirements imposed by the RANS solver inherently limit the generalization 
capabilities of data-driven eddy viscosity models (EVMs) in Sect. 6 and a brief conclusion 
in Sect. 7.

2  Data‑Driven Turbulence Model

The neuralSST model of Mandler and Weigand (2023a) serves as an example of a data-
driven LEVM with variable closure coefficients. It was originally proposed by Mandler 
and Weigand (2022b) and investigated in greater detail by Mandler and Weigand (2022a) 
and Mandler and Weigand (2023b). In the following, we briefly recall the model structure, 
its training procedure and its implementation into the RANS solver.

2.1  Structure

The neuralSST model is a data-augmented variation of the SST model (Menter et al. 2003). 
For convenience, the baseline model is described in Appendix A. The neuralSST model’s 
constitutive laws are given by Eqs. (2) and (3).

Rather than adjusting all the coefficients occurring in the SST model’s scale equations 
(Liu et  al. 2024), we pursue the "k-corrective frozen-RANS" approach (Schmelzer et  al. 
2020) and account for their collective error by introducing a correction term with a single 
variable and dimensionless coefficient g0.


1063Flow, Turbulence and Combustion (2025) 115:1059–1094 

In Eq. (5), RSST
c

 and �SST
t

 denote the original SST model definitions of the scale equations 
and the eddy viscosity given by Eqs. (A3) and (A2), respectively. The remaining constitu-
ents of the correction term apart from the coefficient g0 ensures the dimensional homoge-
neity of Eq. (5). In order to avoid that the turbulent time scale falls below the Kolmogoroff 
scale in immediate vicinity to the wall, we adopt a limiter based on the turbulent Reynolds 
number Ret = k∕(��) (Durbin 1991; Menter et al. 2012)

where �∗ = 0.09 is a constant.
The coefficients gn = gn(Q) ∀n ∈ [0, 1] may be arbitrary functions of the flow field by 

virtue of the independent variables q = q
(
ZM ,ZC

)
∈ Q . The feature set

consists of the non-dimensional magnitudes of both the strain and rotation rate tensor, the 
ratio of the turbulent length scale and the gradient decay length of the TKE, a limited wall 
distance-based Reynolds number Red =

√
kd∕� and one of the blending arguments of the 

original SST model arg2 as defined by Eq. (A7). In Eq. (7), we make use of the Euclidean 
vector norm |⋅| , the Frobenius norm ‖⋅‖ and the softsign function S(�) = �∕(|�| + 1) (Ling 
and Templeton 2015).

2.2  Training Methodology

We adopt a supervised learning strategy opting for a priori accuracy. Thus, both the exem-
plary in- and outputs are sampled from high-fidelity data. We obtain the optimal values of 
the k-correction coefficient for all grid points of the high-fidelity data by solving the fol-
lowing inverse problem (Schmelzer et al. 2020).

Variables that are directly extracted from the high-fidelity data are denoted by a tilde. 
Besides ĝ0 , Eq. (8) also yields the turbulent frequency corresponding to the high-fidelity 
dataset, i.e., �̃ ≡ �̂ . Subsequently, the optimal coefficient of the constitutive law results can 
be obtained from the projection (Mandler and Weigand 2024)

(5)Rc = RSST
c

− g0
k

�

(
1

�SST
−1

t

)
= 0

(6)� = (�∗�)−1 max

(
1.0, 1.8Re

−
1

2

t

)
,

(7)

Q =

�
S

�
�
���S

���
�
S

�
−�

����
���
�
,S

�
−
���k�
√
k

�
, min

�
Red

50
, 2

�
, tanh

�
min

�arg2
2

, 5
���

(8)
�̂, ĝ0 =argmin

�,g0

||Rc
||

s.t. u = ũ, u′u′ =
̃
u′u′, k = k̃

(9)ĝ1 =
̃
u�u� ∶

̃
S

k̃�̃‖‖
̃
S‖‖

2
.


1064 Flow, Turbulence and Combustion (2025) 115:1059–1094

We standardize both model in- and outputs according to Z(�) = (� − �)∕� , where � is a 
vector of observations from a single feature or target. The mean and the standard deviation 
of � are denoted by � and � , respectively. This ensures that all features and targets have 
similar scales, which reduces the stiffness of the subsequent regression problem.

In order to find the optimal parameters �̂ of the functions ĥn = hn
(
Z(q) || �̂

)
 that 

approximate the ground truth mappings h̃n ∶ ℝ
#Q

→ ℝ;Z
(
q̃
)
↦ Z

(
ĝn
)
 best, we solve two 

parametric regression problems.

We use feed-forward neural networks (NNs) with six hidden layers and six neurons each 
as ansatz for the hypotheses hn = hn

(
Z(q) || �

)
 , where � denotes the collection of their 

weights and bias terms. A schematic of such a NN is shown in Fig. 1. In Eq. (10), we sum 
over all observations in the active training set A , which we define in the next section. The 
constraint in Eq. (10) ensures an upper limit for the hypotheses’ Lipschitz constants with 
K0 = 15 and K1 = 1.5 . It, therefore, acts as a means of regularization.

2.3  Training Data

The neuralSST model is informed with direct numerical simulation (DNS) data of a two-
dimensional flow over periodic hills (PHs) for a Reynolds number based on the bulk veloc-
ity ub and the hydraulic diameter Dh of Reb = ubDh∕� = 22,803 . The geometry is depicted 
in Fig. 2. The training configuration is characterized by a hill width scaling factor, a non-
dimensional pitch of the ribs and a non-dimensional channel height of � = 1 , p∕H = 9.0 
and Ly∕H = 3.036 , respectively.

The repeating pattern of flow separation and reattachment poses a serious challenge for 
most existing EVMs as the SST model (Jakirlic 2012). The extension of the separation 

(10)

�̂ =argmin
�

∑

j∈A

[
hn
(
Z
(
q̃
(j)) || �

)
−Z

(
ĝ(j)
n

)]2

s.t.
||||
�hn

�Z(q)

||||
≤ Kn

Fig. 1  Schematic of the feed-forward NNs adopted in the neuralSST model to represent the closure coef-
ficients gn ∀n ∈ [0, 1] as functions of the features qi ∈ Q with rectified linear unit (ReLU) activations � of 
the hidden neurons


1065Flow, Turbulence and Combustion (2025) 115:1059–1094 

zone, which is often of practical interest, is mainly determined by the inclination of the hill 
flanks and the Reynolds number.

Let D = {1, 2,… , #D} denote the set indices of all grid points at which DNS data 
are available. In this particular case, the DNS solution was previously interpolated 
onto a RANS grid, wherefore at most #D = 14,751 observations may be used to train 
the model. In order to prevent learning numerically unstable trends which are a conse-
quence of the deficient Boussinesq hypothesis, however, we use only a subset thereof, 
i.e., the active training set

using the filters

and

with ĝ0min
= −10 and ĝ1min

= 0.

2.4  Implementation

We introduce a blending against the baseline model like Kaandorp and Dwight (2020). 
This is formally equivalent to a damping of the nonlinear terms if the baseline model 
is linear (Huijing et al. 2021; Tang et al. 2023; Ellis and Xia 2023). In general, we may 
write this as a mixture of a set of models M = {SST, nSST} , where nSST represents the 
neuralSST model.

The blending or mixture parameters �m
n

 fulfill the normalization constraint 
∑

m∈M �m
n
= 1 . 

In the simplest case, they are constants but they may also be functions of the local 
flow field. The latter case leads to zonal models. While the blending and the model 

(11)An =
(
D ∩ Fgn

)
⧵ FS ∀n ∈ [0, 1]

(12)FS =
{
j ∈ D

||| �𝜏
(j)‖‖‖

�
S
(j)‖‖‖ < 0.5

}

(13)Fgn
=
{
j ∈ D

||| ĝnmin
≤ ĝ(j)

n
≤ 10

}

(14)gn =
∑

m∈M

�m
n
(q)gm

n
(q)

Fig. 2  The two-dimensional flow 
over periodic hills (PHs): geom-
etry, boundary conditions and 
evaluation lines for the velocity 
profiles (red dashed)


1066 Flow, Turbulence and Combustion (2025) 115:1059–1094

coefficients may share the same feature space, this is not necessary. In this work, we set 
�nSST
n

= 1 ∀n ∈ {0, 1} exploiting the full data-driven correction.
While the model coefficient gSST

0
= 0 , gSST

1
 is given by Eq. (A10). For the neuralSST 

model, we, introduce limiters on the data-driven predictions of the coefficients (Weath-
eritt and Sandberg 2017; Milani et al. 2017; Sotgiu et al. 2019; Hammond et al. 2022; 
Fiore et al. 2022; Rumsey et al. 2022; Miró et al. 2023).

The corresponding limits are set such that g0 ∈ [−1.0, 1.5] and g1 ∈ [0.1, 0.4] . For consist-
ency, the inverse standardization of the predicted coefficients Z−1

gn
 adopts the mean and the 

standard deviation from the training set.
Rather than modifying the baseline model’s implementation of the constitutive law into 

Eq. (1b), we add an explicit source term to it that accounts for the difference of the original 
and the data-driven model (Basara and Jakirlic 2003).

The realizability correction R , as described in greater detail by Mandler and Weigand 
(2022b), is not applied by default but only if it is required to ensure numerical stability of 
the solution. In comparison with alternative implementation strategies, we found this to be 
the most stability-promoting one, which agrees with the results of Amarloo et al. (2022). 
The outlined implementation guarantees that the corresponding effective diffusion coef-
ficient is strictly positive and has a large magnitude, which enhances the diagonal domi-
nance of the system.

3  Training Set Support in the Feature Space

The training set support TSS(q) measures the similarity of a feature vector q that a data-
driven model is queried at and the set of feature vectors 

{
q(j) ∣ j ∈ A

}
 that it has been 

trained on. By defining the distance between a point and a set as the distance between this 
point and its nearest neighbor within the set (Ling and Templeton 2015), we arrive at the 
following expression for the training set support.1

A typical choice for the distance metric d(⋅, ⋅) , which must be defined on the feature space 
Q , is the L2-norm. In Eq. (17), we relate the distance of the query point from the training 
samples to some critical distance d∗ , which we define as the average distance between the 
training samples using the expectation operator � . This allows us to interpret the TSS = 1 
iso-surface in Q as the boundary �S of the region S ⊂ Q in which model queries are well 

(15)gnSST
n

= max
[
gmin
n

, min
(
Z

−1
gn

[
hn(q)

]
, gmax

n

)]

(16)
Dt

(
u
)
= −�p̄∗ + � ⋅

(
2
[
𝜈 + 𝜈SST

t

]
S
)

�����������������������
implicit diffusion term

−� ⋅

(
2𝜈SST

t
S +R

[
−g1k𝜏S

])

�������������������������������������������
explicit source term

(17)TSS(q) ∶=
d∗

d
(
q,
{
q(j) ∣ j ∈ A

}) ∶=
�

({
d
(
q(i), q(k)

)
∣ i, k ∈ A

})

min
{
d
(
q, q(j)

)
∣ j ∈ A

}

1 In this section, we neglect the standardization Z  of the feature space for the sake of brevity. To properly 
account for them, all occurrences of q and Q need to be substituted by Z(q) and Z(Q) , respectively.


1067Flow, Turbulence and Combustion (2025) 115:1059–1094 

supported by the training set. The well-supported region of the feature space is thus given 
by the union of the #A #Q-dimensional hyperspheres Sph

(
q(j), r

)
 centered around each of 

the training samples q(j) with a radius r that equals the mean distance between the training 
samples. This potentially disjoint region is visualized in Fig. 3. In general, the confidence 
in a model’s prediction is negatively correlated with the training set support.

Although the term extrapolation is frequently used as if it was defined as the lack of 
training set support, it is not. A model is actually said to extrapolate if its query point q 
cannot be written as a convex combination of the training samples 

{
q(j) ∣ j ∈ A

}
 (Illowsky 

and Dean 2018). The boundary �I  between the region of the feature space in which the 
model interpolates I ⊂ Q and its complement, the region in which the model extrapolates, 
is thus given by the convex hull around the training samples. This is also shown in Fig. 3. 
Obviously, there may be large regions I ⧵ S in which the model interpolates but the query 
is not well supported and also smaller regions S ⧵ I  which are well supported but cause 
extrapolation. The extrapolation extent is then proportional to the shortest distance from 
the query point q ∈ Q ⧵ I  to the convex hull around the training samples in the model’s 
feature space.

As the evaluation of Eq.  (17) requires to store the entire training set, which can be a 
huge amount of data, Ling and Templeton (2015) proposed to use a memory-efficient 
approximation that only needs access to the statistical moments of the training set, i.e., the 
mean feature vector � and the covariance matrix �.

This approximation relies on the Mahalanobis distance between the query point and the 
training set distribution. The critical Mahalanobis distance d∗

M
 is chosen such that TSS < 1 

holds true for 1 % of the training samples (Ling and Templeton 2015). This implies that the 
well supported region of the feature space is approximated as S ≈ Sph

(
�, d∗

M

)
 , see Fig. 3. 

If the training samples were normally distributed according to N(�,�) , they were evenly 
spread within Sph

(
�, d∗

M

)
 . In this case, the above approximation holds true and TSS can be 

interpreted as the probability that a test sample was drawn from the training distribution 
(Wu et al. 2017). As the training set is usually not normally distributed (Wu et al. 2017), 

(18)TSS(q) ≈
d∗
M√

(q − �)�−1(q − �)T

Fig. 3  Illustrating the subspace 
S of a bi-variate model’s feature 
space Q which is well sup-
ported by the training samples {
q(j) ∣ j ∈ A

}
 (blue dotted), its 

approximation by the Mahalano-
bis distance (red solid) and the 
region of interpolation I  (green) 
which is bounded by the convex 
hull around the training samples 
(green dashed) using the data 
described in Sec. 2.3


1068 Flow, Turbulence and Combustion (2025) 115:1059–1094

which can also be clearly seen in Fig. 3, Eq.  (18) is only a reasonable approximation of 
Eq. (17) in the limit of TSS << 1.

Ideally, a model would only be evaluated within I ∩ S . Considering the inhomogene-
ous distribution of the training samples in the feature space and that the optimal model 
response is inherently unknown in the region I ⧵ S , the training set support seems to be a 
more appropriate measure for the confidence in a model’s prediction than the extrapolation 
extent. In the subsequent section, we thus focus on quantifying the training set support.

Finally, we want to emphasize that phenomenological extrapolation does neither imply 
extrapolation in its strict sense as explained above nor a lack of training set support. Phe-
nomenological extrapolation or generalization is based on loosely defined categorical flow 
patterns like "free-shear flow" or "boundary layer". Whether feature vectors sampled from 
two different flow phenomena differ significantly depends on the choice of the feature 
space, i.e., Eq. (7), and the flow phenomena under consideration.

4  Test Cases

In this section, we describe the different test cases and their numerical setups. A complete 
list including the bulk Reynolds numbers (based on the hydraulic diameter) is given in 
Table 1.

For all cases, we employ a SIMPLEC solver (Doormaal and Raithby 1984) and a second 
order accurate spatial discretization. The computational grids are all structured and wall-
resolved. Both their cell counts N and near-wall resolution in terms of the non-dimensional 
height of the first cell y+

1
= ||u� ||y1∕� , where u� denotes the friction velocity and y1 the wall-

distance, are listed in Table 1. Corresponding grid convergence studies proved that a fur-
ther mesh refinement does not noticeably change the results shown in the present paper. 
For all non-isothermal cases, we adopt values for the molecular and turbulent Prandtl num-
ber of Pr = 0.71 and Prt = 0.85 , respectively. The latter choice is common for channel flow 
applications (Dietz et al. 2007; Weihing et al. 2014; Ellis and Xia 2023).

Flow over periodic hills

Table 1  Test cases and their numerical setups: dimensionality D, bulk Reynolds number Reb , hill width-
scaling factor � , grid creator, cell count N, maximum dimensionless first cell height throughout the entire 
domain ‖‖y

+
1
‖‖∞ , usage of the realizability correction R and source of the reference data

Case D Reb � Grid from N ‖
‖y

+

1

‖
‖∞

R Reference data

PC 1 2.8 ⋅ 104 – – 4.2 ⋅ 101 0.8 ✗ Abe et al. (2001)
PH 2 2.9 ⋅ 103 1.0 Xiao et al. (2020) 1.5 ⋅ 104 0.2 ✗ Breuer et al. (2009)
PH 2 2.3 ⋅ 104 0.5 Xiao et al. (2020) 1.5 ⋅ 104 0.9 ✗ Xiao et al. (2020)
PH 2 2.3 ⋅ 104 1.0 Xiao et al. (2020) 1.5 ⋅ 104 0.9 ✗ Xiao et al. (2020)
PH 2 2.3 ⋅ 104 1.5 Xiao et al. (2020) 1.5 ⋅ 104 0.8 ✗ Xiao et al. (2020)
PH 2 1.5 ⋅ 105 1.0 Xiao et al. (2020) 1.5 ⋅ 104 4.0 ✗ Rapp and Manhart (2011)
PR 3 3.0 ⋅ 104 – Sotgiu (2020) 3.1 ⋅ 105 2.0 ✓ Rau et al. (1998)
IJ 2 2.3 ⋅ 104 – – 6.4 ⋅ 104 1.2 ✗ Cooper et al. (1993)

Baughn and Shimizu (1989)
RMPC 3 5.0 ⋅ 104 – Sotgiu (2020) 8.6 ⋅ 106 2.9 ✓ Schüler et al. (2009)


1069Flow, Turbulence and Combustion (2025) 115:1059–1094 

As we pursue an a priori model development strategy, the training case is the most natu-
ral a posteriori test case. Due to the resulting inconsistency of the learning and the injection 
environment (Sotgiu et al. 2018; Duraisamy 2021; Kurz and Beck 2021) and the potentially 
severe error accumulation over the course of the simulation (Poroseva et al. 2016; Thomp-
son et al. 2016; Wu et al. 2019; Melchers et al. 2023), a posteriori accuracy for the training 
case needs to be demonstrated to ensure a meaningful generalization study.

The geometry and the corresponding boundary conditions of the training case are 
depicted in Fig.  2. Besides the particular training configuration, we consider two varia-
tions of the Reynolds number and the hill slope, respectively, as indicated in Table 1. These 
variations are chosen such that they alter the baseline solution in opposite ways. While one 
increases the size of the separation zone, the other one decreases it wrt. the training con-
figuration. The non-dimensional pitch of the hills depends on the hill width scaling factor 
� according to p∕H = 9.0 + 3.858(� − 1) . As all PH cases are considered to be isothermal, 
Eq. (1c) and the Eq. (3) are obsolete.

Flow over periodic ribs
The flows over periodic ribs (PRs) and PHs are phenomenologically similar as both are 

governed by large separation zones between periodic obstacles. As for the training case, 
the obstacles’ non-dimensional pitch amounts to p∕H = 9 . However, there are four major 
differences in comparison with the training case that are apparent from Fig. 4. First, the 
obstacles are sharp ribs rather than smooth hills. The location at which the flow separates 
from the contour is thus determined by the geometry at x∕H = 0.5 . The upper boundary is, 
secondly, defined as a symmetry plane as there are ribs attached to both the upper and the 
lower wall. Thirdly, the ratio of the channel and the rib height Ly∕H = 10 is larger. Finally, 
this case is three-dimensional with lateral walls. These induce a Reynolds stress anisot-
ropy-driven secondary flow which increases the momentum transfer normal to the bulk 
flow direction. Due to the linearity of Eq. (2) on the mean strain rate tensor, this second-
ary flow cannot be accounted for and is not further investigated. We may thus not expect a 
perfect agreement in the primary flow solution. As the secondary motion tends to shift the 
reattachment point upstream, we rather have to expect that the overestimation of the sepa-
ration zone’s extension is only reduced but not entirely compensated for. Nevertheless, this 
test case allows for an assessment of the models’ ability to generalize to cases in which the 
mean flow solution varies in more dimensions of the physical space and in which the flow 
separation is triggered by a different mechanism.

Plane channel flow

Fig. 4  The three-dimensional flow over periodic ribs (PRs): geometry, boundary conditions and evaluation 
lines for velocity and Nusselt number profiles (red)


1070 Flow, Turbulence and Combustion (2025) 115:1059–1094

The flow through a plane channel (PC) is one of the canonical calibration cases of tur-
bulence models. Although it is thus hardly possible for a data-driven LEVM to achieve an 
improvement compared to the baseline prediction, a similar accuracy is typically expected 
(Srivastava and Duraisamy 2021; Rumsey and Coleman 2022). The computational setup is 
the same as for the flow over PR if H = 0 and periodic boundary conditions are applied in 
spanwise direction. In contrast to the training case, in which the boundary layers are sub-
ject to adverse and favorable pressure gradients (APGs/ZPGs), the distinguishing phenom-
enon of the PC flow is the ZPG boundary layer. As a consequence, the PC flow allows us 
to study how well the model generalizes to cases in which the mean flow solution varies in 
less physical dimensions and which are governed by a different flow phenomenon.

Impinging Jet
All aforementioned test cases are channel flows of varying complexity. In order to test 

the neuralSST model’s ability to handle stagnation point flows or flows with strong stream-
line curvature, we apply it to an impinging jet (IJ). The computational setup is illustrated in 
Fig. 5. In relation to the nozzle diameter D = 2R , the height of the nozzle over the impinge-
ment plate, the wall thickness of the pipe, the radial extension of the domain and the outlet 
plenum height amount to H∕D = 2 , t∕D = 0.0313 , Lr∕D = 15 and Ly∕D = 4 , respectively. 
At the inlet, we prescribe the solution uy(r) of a fully developed pipe flow for a Reynolds 
number of Reb = 23,000 and a uniform temperature Tin . The outlet boundary condition 
is chosen to allow for backflow into the domain. While the pipe wall is set adiabatic, the 

Fig. 5  A single impinging jet 
(IJ): geometry, boundary condi-
tions and evaluation lines for 
velocity profiles (red, dashed)

Fig. 6  The flow through a ribbed multi-pass channel: geometry, boundary conditions and streamwise num-
bering of the segments (red). Geometry taken from Schüler et al. (2008)


1071Flow, Turbulence and Combustion (2025) 115:1059–1094 

impingement plate is held at a constant temperature Tw ≠ Tin . Exploiting the radial symme-
try of the round jet, we only compute a 5◦ wedge of the circular domain and apply periodic 
boundary conditions.

Ribbed Two-Pass Cooling Channel
Our final test case is a ribbed two-pass cooling channel as illustrated in Fig.  6. Such 

ribbed multi-pass channels (RMPCs) are commonly used for the internal cooling of gas 
turbine blades (Suo 1985; Weigand et al. 2001; Han 2004). The channel’s cross-section is 
thus mainly determined by the blade contour, wherefore the first pass is clearly not rectan-
gular. To increase the mixing of the hot coolant near the walls with the colder coolant in 
the channel center, ribs are used as turbulators. Due to their inclination, they additionally 
induce a secondary flow towards the leading edge (LE), where the cross-section becomes 
narrow and the coolant heats up more rapidly. This secondary fluid motion helps to cool 
the LE, which faces the highest thermal loads of the blade.

Each of the previous test cases only contains a very limited number of flow phenomena, 
where one usually dominates the others. On the contrary, the RMPC comprises a multitude 
of different flow phenomena, e.g., those present in our other test cases: flow separation, 
Reynolds stress anisotropy-induced secondary flows, strong streamline curvature as well 
as regions of stagnating flow. It may thus be seen as a superposition of the aforementioned 
test cases and thus allows us to quantify to which extent the interaction of different phe-
nomena affects the model’s predictive accuracy.

We investigate a generic geometry of a RMPC which has been subject to many experi-
mental and numerical investigations (Schubert 2005; Schüler et  al. 2008, 2009, 2010; 
Elfert et al. 2011; Schroll et al. 2011; Schüler et al. 2011). The hydraulic diameter of the 
inlet and outlet pass amount to Din

h
= 0.6642ds and Dout

h
= 0.6642ds Despite the 45° incli-

nation of the ribs, their geometry is very similar to those of the PR case. In fact, their non-
dimensional height H∕Dh = 0.1 is the same and their non-dimensional pitch p∕H = 10 is 
only slightly larger.

At the channel inlet, we prescribe a hydrodynamically and thermally fully developed 
flow for a bulk Reynolds number of Reb = 50,000 and a fixed bulk temperature Tb . At the 
outlet, we impose a zero-gradient condition for all variables. We furthermore enforce the 
wall temperature Tw ≠ T

in

b
.

5  Results

In this section, we present the results obtained when applying the neuralSST model to the 
test cases explained above. Besides discussing the flow field and the heat transfer, we also 
analyze the training set support.

5.1  Flow Over Periodic Hills

As listed in Table 1, there are five different test configurations of the flow over PHs. At 
first, we present the results obtained on the training case and then continue by analyzing 
the consequences of changes in either the Reynolds number or the hill slope.


1072 Flow, Turbulence and Combustion (2025) 115:1059–1094

5.1.1  Training Case

For this isothermal test case, we assess the neuralSST model’s local accuracy by virtue of 
mean velocity profiles sampled along the bulk flow direction at the locations indicated by 
the red dashed lines in Fig. 2. The corresponding velocity profiles are shown in Fig. 7a. 
Apparently, the neuralSST model consistently outperforms the SST model throughout the 
entire domain. This is particularly evident in the following three regions: in the vicinity 
and downstream of the true reattachment point, just above the hill crest and in the bulk 
flow above the ribs. These differences are mainly caused by a substantial increase of the 
eddy viscosity due to the data-driven correction, especially in the free shear layer emerging 
from the hill crest, see Fig. 8. As the lateral momentum transfer increases accordingly, the 
recirculation zone shrinks and the reattachment points shifts upstream. Due to continuity, 
the horizontal velocity component in the bulk flow above the hill crests must consequently 
decrease. This local improvement over the baseline model’s prediction also manifests glob-
ally in a 7 % reduction of the relative mean absolute error (MAE) of the skin friction coef-
ficient across the bottom wall as shown in Table 2. The relative MAE of some quantity of 
interest � on a boundary surface Ω is defined as

Fig. 7  Mean streamwise velocity profiles in a flow through a two-dimensional channel with periodic hills 
for � = 1.0 , Reb = 22,803 (training) (a), � = 1.0 , Reb = 2,850 (b), � = 1.0 , Reb = 150,664 (c), � = 0.5 , 
Reb = 22,803 (d) and � = 1.5 , Reb = 22,803 (e)

Fig. 8  Relative increase of 
eddy viscosity predicted by the 
neuralSST model wrt. the SST 
model for the flow through a two-
dimensional channel with PHs at 
Reb = 22,803 and � = 1.0


1073Flow, Turbulence and Combustion (2025) 115:1059–1094 

Although the model was trained purely a priori, it proves effective in making accu-
rate a posteriori predictions. This is remarkable as there is no such guarantee (Meneveau 
1994a; Liu et al. 1994). We found that this achievement was only possible with particu-
lar feature sets and strong regularization, e.g., by enforcing small Lipschitz constants. 
While the former seems to be related to how well each feature is reproduced during the 
initial iterations of the solution process, the latter reduces the rate at which such initial 
errors are amplified (Melchers et al. 2023).

In order to quantify the local training set support, we apply the two methods intro-
duced in Sec. 3, which are based on the nearest neighbor and the Mahalanobis distance, 
respectively, to the converged flow solution. This comparison is illustrated in Fig. 9. At 
first, we focus on the nearest neighbor-based measure shown in Fig. 9a. Model queries 
are apparently well supported by training samples with a similar feature vector through-
out large parts of the domain. However, there are three distinct regions where TSS−1 ≥ 1 
and where the support is accordingly lacking: the low-strain band in the bulk flow, a 

(19)�� = �
(
�, �̃

)
=

∫
Ω
||� − �̃||dA
∫
Ω
||�̃||dA

.

Table 2  Integral a posteriori training set support metrics and relative MAEs of the neuralSST model for 
the converged solution of all test cases, which are specified in terms of their dimensionality, bulk Reynolds 
number and hill width scaling factor (if applicable)

Case D Reb � max(TSS−1) TSS−1 �nSST �nSST

�SST
− 1

Nu cf Nu cf

PC 1 28,250 – 2.09 0.36 0.10 0.03 1.16 3.21
PH 2 2,850 1.0 3.30 0.56 – – – –
PH 2 22,803 0.5 3.42 0.37 – 0.56 – 0.24
PH 2 22,803 1.0 3.49 0.35 – 0.83 – −0.07

PH 2 22,803 1.5 3.40 0.34 – 0.50 – −0.27

PH 2 150,664 1.0 3.42 0.33 – – – –
PR 3 30,000 – 3.34 0.27 0.17 0.58 −0.55 −0.42

IJ 2 23,000 – 3.55 0.41 0.25 – 2.74 –
RMPC 3 50,000 – 3.19 0.35 0.18 – 0.60 –

Fig. 9  Training set support of the neuralSST model applied to the two-dimensional flow over PHs at 
Reb = 22,803 and � = 1.0 according to Eq.  (17) (a) and its approximation by the Mahalanobis distance 
given by Eq. (18) (b). The green lines indicate the boundary of the intersection of the active training sets of 
both model coefficients �

(
A0 ∩A1

)


1074 Flow, Turbulence and Combustion (2025) 115:1059–1094

circular region just above the hill crest and a shallow band along the bottom wall. All 
three regions have in common that they are characterized by a low non-dimensional 
strain rate, i.e., q1 < 0.5 . As explained in Sec.  2, we have removed samples with this 
characteristic from the training set. The boundaries of those regions from which the 
samples of the active training set were drawn are highlighted in green. In fact, they 
enclose all three poorly supported regions. This lack of support is thus a consequence 
of the chosen counteraction to prevent a higher-level model error, namely that the RST 
only depends on local velocity gradients, from distorting the model training.

Despite this strong spatial variation of the training set support across the domain, the 
velocity profiles are accurate almost everywhere. In fact, the rank correlation between 
the local training set support and the local accuracy of the velocity prediction amounts 
to approximately 14 %. Our results thus demonstrate that the local training set support is 
no reliable indicator for the prediction accuracy. This makes intuitively sense because the 
training set support is only defined on the model’s feature space but does not take into 
account the required model response. Hence, the local training set support may only be 
used as a measure of confidence in the prediction.

In Fig. 9b, we display the efficient TSS-approximation based on the Mahalanobis dis-
tance. The regions of lacking support characterized by TSS−1 > 1 are accurately captured. 
Nevertheless, the support in the free-shear layer downstream of the hill crest is signifi-
cantly underestimated, which ultimately results in a too pessimistic prediction confidence. 
As the Mahalanobis distance-based metric seems to be misleading, we only apply the near-
est-neighbor method to the remaining test cases.

5.1.2  Variation of Reynolds Number

Starting to explore the application limits of the neuralSST model, we begin with a variation 
in the bulk Reynolds number. We tested six different Reynolds numbers, but in this paper, 
we only report the results of the lowest and the highest one. The corresponding mean 
velocity profiles are depicted in Fig. 7b and c, respectively.

For the low Reynolds number, we find a similar improvement over the baseline predic-
tion as for the training case. Again, the data-driven model yields almost exactly the high-
fidelity mean flow solution. We see only a slight overestimation of the size of the recircula-
tion zone. For the highest Reynolds number, however, we observe a smaller accuracy gain. 
Despite still being more accurate than the SST model, the location of the reattachment 
point is located too far downstream. We, therefore, conclude that the turbulent mixing in 
the free shear layer tends to be underestimated for high Reynolds numbers but less than by 
the SST model.

As the contour plots of the local training set support are qualitatively similar for all 
Reynolds numbers, we refrain from showing them. Instead, we compare two integral met-
rics, namely the maximum and the mean value of the inverse of the local training set sup-
port within the domain as listed in Table 2. We interestingly find that the minimum training 
set support slightly decreases towards both higher and lower Reynolds numbers. On aver-
age, however, the training set support increases with the Reynolds number. This reveals 
that global TSS metrics may be treated with caution.


1075Flow, Turbulence and Combustion (2025) 115:1059–1094 

5.1.3  Variation of Hill Slope

The second variation of the training configuration concerns the hill slope. Again, we 
focus on two variations which oppositely influence the baseline solution, i.e., � = 0.5 and 
� = 1.5 . The corresponding mean velocity profiles are shown in Fig. 7d and e, respectively. 
We note that for steeper hills which yield a longer separation zone, the mixing in the shear 
layer is even overestimated, which ultimately leads to early reattachment. In spite of that, 
the velocity profiles still agree well with the reference data almost everywhere. This also 
manifests in a smaller relative MAE compared to the training case, which implies decent 
generalization capabilities. Compared to the baseline model, the relative MAE increases 
by 24 %, though, see Table 2. This is in fact due to the baseline model generalizing much 
worse to the training than to the test case. Thus, the accuracy gain of the SST model from 
training to test case is larger than that of the neuralSST model. Further, we conclude that 
the neuralSST model seems to affect the solution only when the baseline model is inac-
curate, which is desirable according to Srivastava and Duraisamy (2021); Beck and Kurz 
(2021). A more gentle hill slope tends to cause a shorter recirculation zone. For this case, 
the augmented model slightly underestimates the turbulent mixing in the free shear layer 
resulting in a late reattachment. Nevertheless, we observe a significant accuracy gain over 
the baseline model both locally in the velocity profiles shown in Fig. 7 but also globally in 
the relative MAE of the skin friction coefficient given in Table 2. In fact, the skin friction 
prediction improves by 27 %, which is even more than for the training case.

Varying the size of the recirculation zone by means of the hill slope affects the mini-
mum and the mean training set support in the same way as a variation Reynolds num-
ber does. The minimum decreases in both directions, whereas the mean strictly decreases 
towards shallower hills.

5.2  Flow Over Periodic Ribs

Due to very limited reference data, we can only assess the predictive accuracy of the neu-
ralSST by means of a profile of the main velocity component in the symmetry plane z = 0 
sampled along the red dashed line in Fig. 4. The comparison of the numerical predictions 
and the experimental reference data is depicted in Fig. 10.

Fig. 10  Mean axial velocity along y∕H = 0.1 and Nusselt number distribution along y∕H = 0 in the span-
wise symmetry plane z = 0 of a flow through a three-dimensional channel with PRs for Reb = 30,000


1076 Flow, Turbulence and Combustion (2025) 115:1059–1094

The reference data suggest that there is a rib-induced separation zone which extends at 
least halfway towards the subsequent rib. While the SST model fails to predict any reat-
tachment, the neuralSST captures the qualitative trend of the reference data much better 
and only slightly overestimates the size of the recirculation zone. As mentioned in Sec. 4, 
the linear constitutive law is not capable of reproducing the Reynolds stress-anisotropy 
driven secondary vortices. These would further enhance the momentum transport and 
reduce the extension of the separation zone. Nonetheless, the data-driven model leads to a 
relative MAE improvement of 42 % regarding ux(y∕H = 0.1, z = 0) . Since the results are 
sampled in close vicinity to the bottom wall, we may regard the curves ux∕ub in Fig. 10 as 
an approximation for the distribution of the skin friction coefficient2 and, therefore, listed 
�ux as an approximation for �cf in Table 2.

In Fig. 10b, we furthermore display the distribution of the Nusselt number

where the wall temperature Tw , the local bulk temperature Tb and the magnitude of the 
wall-normal temperature gradient at the wall �T∕�n|w = �T|w ⋅ n depend on the sampling 
location x/H, along the bottom wall. The wall-normal vector n is assumed to have a unit 
length. The Nusselt numbers are normalized by those for a smooth channel obtained from 
the Dittus-Boelter correlation (McAdams 1942).

As the neuralSST model predicts the intensity of the recirculation more accurately than the 
SST model, also the Nusselt numbers are in much better agreement with the reference data 

(20)Nu = −
Dh

Tw − Tb

�T
�n

|||w,

(21)Nu0 = 0.023Re0.8
b
Pr0.4

Fig. 11  Local training set support of the neuralSST applied to the flow through a three-dimensional channel 
with PRs at Reb = 30,000 plotted in the spanwise symmetry plane ( z∕H = 0) and a section plane normal to 
it ( x∕H = 3.125 ). The green lines indicate the contours q1 = 0.5 separating the regions of high and low non-
dimensional strain

2 Due to the no-slip condition, we may approximate the wall-normal velocity gradient at the wall as fol-
lows: �u

x
∕�y ||w≈ u

x
∕y . As this linearization is considered accurate for y+ ≤ 5 and the averaged wall-dis-

tance across the sampling line amounts to y+ = 6.5 viscous units, this is an acceptable approximation.


1077Flow, Turbulence and Combustion (2025) 115:1059–1094 

in the first half of the domain. Further downstream, the accuracy gain of the Nusselt num-
ber prediction decreases, which is a consequence of the imperfect velocity field. Neverthe-
less, this leads to a improvement of 55 % in the relative MAE of the Nusselt number across 
the bottom wall, see Table 2.

This test case is similar to the training case regarding its phenomenology and the 
Reynolds number. Accordingly, we do not expect a significant decrease of the train-
ing set support, which is visualized in Fig. 11, compared to the training case. We find 
virtually the same three regions of missing support as for the training case: the channel 
center, a region in proximity to the hills’ upper LE and a band along the bottom wall. 
Interestingly, this is the test case with the highest support on average and the minimum 
support is also larger than in the training case, see Table  2. The previously observed 
trend that steeper hill flanks tend to decrease the mean training set support cannot be 
confirmed. This is likely due to the three-dimensional domain, in which the fraction of 
poorly supported test samples is comparably lower.

5.3  Plane Channel Flow

The accuracy of the PC flow simulation is assessed by means of the axial velocity pro-
file along the wall-normal direction, which is depicted in Fig. 12. Note that this is one 
of the canonical calibration cases for turbulence closure models. One could say, that this 
is one of the training cases of the SST model. Although CFD-in-the-loop training (Hol-
land et al. 2019; Zhao et al. 2020; Novati et al. 2021) was not feasible at the time the 
SST model was developed, the dynamical behavior of the closed RANS equations was 
taken into consideration by analytically analyzing and tuning their fixed point solution 
for equilibrium turbulence in the log-layer (Wilcox 2006). This explains the exception-
ally good agreement of the SST model prediction with the reference data (Taghizadeh 
et al. 2020). The neuralSST model also produces accurate results in the viscous sublayer 
and the log-layer, but the profile exhibits visible deviations in the buffer layer. This is 
clearly because the fixed point solution has not been considered when re-calibrating the 
model for the flow over PHs. Nevertheless, the relative MAEs in both the skin friction 
coefficient and the Nusselt number are substantially smaller than for all other test cases, 
see Table 2.

Fig. 12  Mean axial velocity (left) and local training set support (right) plotted the over wall normal distance 
in viscous units for a plane channel flow at Reb = 28,250 . The green line indicates the contour q1 = 0.5 
separating the regions of high (left) and low (right) non-dimensional strain


1078 Flow, Turbulence and Combustion (2025) 115:1059–1094

The local training set support shown in Fig.  12 reveals that the model queries are 
very well supported almost everywhere. The only two exceptions are the viscous sub-
layer and the symmetry plane, which is consistent with our observations for the training 
case. Even though the training case contains only boundary layers with non-vanishing 
pressure gradients, the ZPG phenomenology does not seem to alter the feature vectors 
significantly. In that sense, some robustness against phenomenological extrapolation is 
embedded into the model. As the PC flow may be seen as the limit of the flow over PHs 
for a vanishing hill slope, we would expect the mean training set support to be smaller 
than for the shallowest hill under consideration. This is not the case, though. The mini-
mum training set support is, however, significantly smaller than for all other test cases.

5.4  Impinging Jet

For the IJ, reference data for the mean flow field are available along different vertical 
lines sampled along the radial direction. These lines are highlighted by red dashed lines in 
Fig. 5. The corresponding profiles of the velocity magnitude along these lines are depicted 
in Fig. 13. The first sampling line follows the stagnation streamline and the results of both 
the SST and the neuralSST model agree well with the experimental data. The remaining 
lines provide insights on how accurate the wall jet is captured. We clearly see that the neu-
ralSST model is more diffusive than its baseline model. However, both numerical predic-
tions fall into the range of experimental uncertainty which is shown in gray. We can, there-
fore, hardly assess which model is more accurate. In Fig.  13, we also display the radial 
Nusselt number evolution on the impingement plate. In contrast to the SST model, the 
data-driven model clearly overpredicts the heat transfer in the stagnation point, it does not 

Fig. 13  Magnitude of the mean velocity along different wall-normal lines (left) and radial distribution of 
the Nusselt number (right) for an IJ at Reb = 23,000 with the experimental uncertainty indicated by gray 
bands

Fig. 14  Local training set sup-
port of the neuralSST applied to 
impinging jet at Reb = 23,000 . 
The green lines indicate the 
contour q1 = 0.5 separating the 
regions of high and low non-
dimensional strain


1079Flow, Turbulence and Combustion (2025) 115:1059–1094 

capture the secondary peak in the wall-jet region and slightly underestimates the Nusselt 
numbers further downstream. Although the training case contains regions with an adverse 
pressure gradient (APG), we consider the stagnation point flow around the jet axis as a 
phenomenon which the model has not seen during its training. The SST model, on the con-
trary, contains an eddy viscosity-limiter that was designed to reduce the turbulent momen-
tum transfer in strong APG situations. Due to the simple scalar flux model assuming a 
constant turbulent Prandtl number, this directly translates into a reduction of the turbulent 
temperature transport. The absence of such an explicit limiter in the data-driven model, 
explains both observations described above: the deviations in the velocity profiles and the 
Nusselt numbers.

The flow phenomena governing this test case noticeably affect the local training set support 
as indicated by Fig. 14. The model queries are poorly supported in all three regions of interest: 
the free shear layer emerging from the nozzle exit, the stagnation zone around the jet axis and 
the wall jet along the impingement plate. This also manifests in a decrease in the global sup-
port metrics, which are given in Table 2, in comparison with channel flows at similar Reyn-
olds numbers. These findings further suggest that the flow over PHs is no suitable training 
case for a model which is supposed to be applied to an IJ.

5.5  Ribbed Two‑Pass Cooling Channel

As there are no velocity measurements for the generic two-pass channel available, we focus 
only on the heat transfer. Hammond et  al. (2022), Fischer et  al. (2023) and Ellis and Xia 
(2023), however, reported that an augmented momentum closure can significantly increase 

Fig. 15  Normalized, segment-averaged Nusselt numbers for a ribbed two-pass cooling channel (RMPC) 
along the suction side (SS), pressure side (PS) and leading/trailing edge (LE/TE) walls (a). Histogram of 
the relative increase of the absolute error of the area-weighted average of the segment-averaged Nusselt 
numbers over all walls for the present (orange dashed line) and other parametrizations of the neuralSST 
model (orange shaded area) as well as a hypothetical model just falling into the experimental uncertainty 
range (black dotted line) (b)


1080 Flow, Turbulence and Combustion (2025) 115:1059–1094

the accuracy of the heat transfer prediction despite using a very simple scalar flux model. We 
verified that on the PR test case.

To simplify the analysis, we subdivide the walls of the channel into segments. These seg-
ments are usually separated by two subsequent ribs. In Fig. 6, the subdivision is illustrated 
and the segments are assigned numerical labels which we use as a proxy of the streamwise 
coordinate in the following. The segmentation not only applies to the suction side (SS) which 
is explicitly shown, but also to the suction side (SS) as well as the walls facing the LE and the 
trailing edge (TE).

In Fig.  15a, we compare the streamwise evolution of the segment-averaged Nusselt 
numbers

with Ω being a segment’s surface and |Ω| the corresponding area, for the four channel 
walls. We again adopt Eq. (21) for the sake of normalization. As reference temperature for 
the Nusselt numbers of all segments, we use the bulk temperature at the inlet.

In the first segment of the SS, the smooth inlet section, both closure models yield higher 
heat transfer rates than the experiment. There are two potential reasons for this. First, we 
simplified the problem by neglecting the upstream parts of the inlet section as implemented 
in the experimental facility. Although we carefully matched both the bulk Reynolds num-
ber and the bulk temperature, the in-plane distributions of momentum and temperature may 
differ. Secondly, we previously observed that the SST and the neuralSST overestimate the 
heat transfer in a PC flow by 5 % and 10 %, respectively. As a consequence, the neuralSST 
model is slightly less accurate in segment 0 on the SS. Between segment 1 and 8, where 
the ribs are located in the first pass, the SST model underestimates the Nusselt numbers. 
The data-driven closure, that was trained for channel flows with periodic obstacles, leads to 
higher Nusselt numbers. Behind the first few ribs, where a quasi-periodic state is reached, 
the agreement with the reference data is excellent. In the bent region between segment 9 
and 12, several phenomena which are not included in the training set play a critical role. In 
segment 9, the mean flow periodicity is broken as there is no further upstream rib. Then, 
in segment 10, the primary flow impinges onto the tip wall which leads to a deflection of 
the mean flow and strong streamline curvature. Further downstream, in segment 11, the 
deflected flow impinges onto the TE wall resulting in a second deflection into the second 
pass and towards the outlet. As seen in the previous section, the neuralSST significantly 
overestimates the Nusselt number in impingement scenarios. We find the same trend in this 
more complex configuration. Like in the first pass, the neuralSST model predicts higher 
heat transfer rates in the second pass than its baseline model which is in better agreement 
with the experimental results.

The trends along the PS are very similar. We thus restrict ourselves to emphasizing two 
noteworthy differences. First, at the end of the first pass, both closure models overestimate 
the experimental Nusselt numbers by the same margin. In this region, the data-driven cor-
rection does, therefore, not benefit from being trained for this particular flow regime. Sec-
ondly, the neuralSST model correctly predicts lower heat transfer rates than the SST model 
in the second pass. This demonstrates that the data-driven augmentations do not yield sys-
tematically higher heat transfer rates, but indeed correct for local deficiencies of the base-
line model.

Despite the promising results across the SS and the PS, the neuralSST model signifi-
cantly overestimates the Nusselt numbers along the LE and TE walls. The inclined ribs 

(22)Nu =
1

|Ω| ∫Ω

Nu dA,


1081Flow, Turbulence and Combustion (2025) 115:1059–1094 

induce a strong secondary flow that impinges onto the LE wall in the first pass and onto 
the TE wall in the second pass. This similarity with a single IJ, that we investigated above, 
explains the difference between the two closure models’ predictions.

We summarize these streamwise evolution of the segment-averaged Nusselt numbers 
along the three walls in a single integral metric: the relative increase of the absolute Nus-
selt number error averaged over all segments of all walls and weighted by the correspond-
ing surface area wrt. the SST model prediction. As listed in Table 2, the neuralSST leads to 
an approximately 60 % higher integral error. We also illustrated this in Fig. 15b. The local 
accuracy gains in the ribbed sections of the first and the second pass are thus overcompen-
sated by the local accuracy losses in other parts of the channel. Especially the local effects 
along the LE and TE walls as well as those in the bend region have a detrimental effect on 
the integral model performance.

In order to test whether this is a coincidence due to the particular model parameteriza-
tion, we also tested other versions of the neuralSST model. We varied the training set (vari-
ous combinations of PC, square duct flows and flows over PHs), the regression method, the 
hyperparameters of the regression algorithm and the feature set. We even included nonlinear 
terms into the momentum closure. For all models that led to convergence for all training cases 
involved as well as the RMPC, we computed the relative MAE increase in the segment-aver-
aged Nusselt numbers and plotted their histogram in Fig. 15b. Even though some models’ 
integral accuracy is in close agreement with that of the SST model, apparently none of them 
leads to a net gain. These model versions loose their local advantages over the SST model due 
to strong regularization, which is either enforced explicitly, e.g., by the constraint in Eq. (10), 
or implicitly by virtue of a small feature space or competing training cases. However, the mod-
els may become arbitrarily inaccurate as they become more complex as a consequence of lack-
ing regularization. In fact, they often fail to converge to a stable solution at all. The model ver-
sion we present in this paper seems to be an average representative of the class of neuralSST 
models trained on attached and separated channel flows.

Fig. 16  Local training set support of the neuralSST model applied to a RMPC in various section planes per-
pendicular to the bulk flow direction. The green line indicates the contour q1 = 0.5 separating the regions of 
high and low non-dimensional strain


1082 Flow, Turbulence and Combustion (2025) 115:1059–1094

As a final note on the model accuracy, we quantify the maximum integral accuracy gain 
possible. Taking the experimental uncertainty of � = 8% into account, we must accept any 
prediction that falls into the range ̃Nu ± � as equally accurate. Given these particular reference 
data and acknowledging that the SST model’s predictions fall into the experimental uncer-
tainty range in most of the segments, we can expect a maximum accuracy gain of only

Hence, it is very challenging to perform better than the SST model on this test case. Only 
detailed measurements of the flow field would help to differentiate potentially more accu-
rate models.

In Fig.  16, the local training set support is shown on six planes perpendicular to the 
main flow direction. Like for the other channel flows, we find a lack of support in the center 
of the smooth inlet section at z∕ds = 1 , where the strain rate vanishes. The first rib, which 
is cut by the plane z∕ds = 3 , causes a significant change in the model inputs compared to 
the training set near its leading edge. This is also consistent with the observations made 
in the PR case. In addition, the secondary flow towards the LE wall, which resembles an 
IJ, decreases the training set support. Another large region of lacking training set support 
manifests towards the inner walls of both the first and the second pass. At z∕ds = 9.95 , in 
close proximity to the tip wall, two prominent streaks along the SS and PS wall are visible. 
Just like in the wall jet forming in the IJ case, these regions are also characterized by a low 
non-dimensional strain rate. The same holds for the thin band along the TE wall in the sec-
ond pass.

Despite the interaction of the different flow phenomena, e.g., flow separation, rib-
induced secondary flow, stagnation point regions, strong streamline curvature and wall jets, 
we find that most of the observations regarding the models predictive accuracy and training 
set support from the individual cases can be transferred. The mean support is surprisingly 
the same as for the training case and the the minimum support is even larger. Phenom-
enological extrapolation is, therefore, not reliably detected by quantifying the training set 
support.

6  Discussion

In the previous section, we have demonstrated that missing training set support does not 
imply a poor generalization of the model to different flow phenomena. Although it might 
contribute, the main driver of generalization errors must be a different one. In the follow-
ing, we first identify the root cause of generalization errors and subsequently argue why the 
generalization capabilities of data-driven EVMs are inherently limited. Finally, we evaluate 
strategies to extend the spectrum of different phenomena that a model may be successfully 
applied to.

6.1  Root Cause of Generalization Errors

For a given feature space of a model, the mapping between model in- and outputs may not 
be unique (Zhang and Duraisamy 2015; Sotgiu et  al. 2019; Geneva and Zabaras 2019). 

(23)
�±�
Nu

�SST
Nu

− 1 =
∫
Ω
||Ñu(1 ± �) − Ñu||dA

∫
Ω
||Nu

SST − Ñu||dA
− 1 =

�

�SST
Nu

− 1 = −0.2954.


1083Flow, Turbulence and Combustion (2025) 115:1059–1094 

Despite the similarity of two feature vectors, the corresponding model responses that are 
required for an accurate prediction may significantly differ. Liu et al. (2021) provide a thor-
ough analysis of this multivalence problem which has two implications for the generaliza-
tion capabilities of a model.

On the one hand, if it was optimized for one of the two targets, it necessarily performs 
poorly on the other one. In machine learning, this is known as a data-mismatch problem. 
That means, the model has been informed with pairs of in- and outputs that do not repre-
sent the required mapping for the application scenario. It may be detected by defining the 
training set support on both a model’s feature and target space. This provides an expla-
nation for the observed trend that the generalization errors increase as a model is further 
tuned for a specific flow phenomenon.

On the other hand, if the model was trained to accurately predict both competing targets, 
the best possible solution is the mean between the two ambiguous solutions. In order to 
overcome this regularization effect of a diverse training set and resolve all ambiguities, a 
higher dimensional feature space is required (Liu et  al. 2021; Srivastava and Duraisamy 
2021).

6.2  Inherent Generalization Limitations

The reasons why expanding the feature space significantly seems to be merely a theoreti-
cal option, though, are threefold. First, it is not guaranteed that there are sufficiently many 
candidate features that comply with all requirements, e.g., Galilean invariance and non-
dimensionality, in order to ensure a unique mapping. Although plenty of candidate features 
can be derived from tensor representation theory, these invariants are typically highly cor-
related. For the popular two-dimensional training cases like the flow over PHs, as used in 
this work, there are indeed only two non-vanishing, independent invariants (Pope 1975). 
Geneva and Zabaras (2019) and Jiang et al. (2021) furthermore found that a feature set that 
only contains the invariants of the strain and rotation rate tensors do in general not suffice 
to ensure a unique mapping.

Secondly, the number of training examples is fixed, which increases the chances of 
overfitting if further features are added to the model (Hastie et al. 2017).

Thirdly, a high-dimensional feature space would substantially increase the model’s com-
plexity and thus decrease its stability even if unlimited samples were available to prevent 
overfitting. Melchers et al. (2023) demonstrated that initially small errors grow exponen-
tially over the course of a RANS simulation at a rate which is correlated with the model’s 
complexity. The stability requirements that emerge from the ill-conditioned RANS equa-
tions (Wu et al. 2019) thus inherently limit the number of distinguishable flow phenomena 
and ultimately the generalization capability of data-driven EVM models. This is the reason 
that closure models which are evaluated many times during a simulation are typically char-
acterized by a low-dimensional feature space (Srivastava and Duraisamy 2021). As another 
consequence, it becomes increasingly challenging to improve EVMs for a particular test 
case as the number of contained flow phenomena grows.

6.3  Potential Strategies to Improve Generalization

Our empirical study has revealed that the linear neuralSST model does only generalize well 
within the class of flows that are phenomenologically similar to its training case. Although 
we theoretical derived that the generalization capabilities are inherently limited, we cannot 


1084 Flow, Turbulence and Combustion (2025) 115:1059–1094

quantify this limit. Thus, there may be data-driven closure models with variable coeffi-
cients that generalize well to a broader spectrum of flows comprising maybe two or a few 
different phenomena. In the following, we discuss some strategies and principles which 
may be incorporated for this purpose in future generalization studies.

6.3.1  Nonlinear Constitutive Laws

The generalization capability of LEVMs are limited by their structure (Taghizadeh et al. 
2020). Thus, they insufficiently describe, for instance, flows with Reynolds stress anisot-
ropy-driven secondary flows (Speziale 1987; Pettersson Reif 2006), system rotation (Shih 
et al. 1995) or strong streamline curvature (Craft et al. 1996). While our study confirms 
the multivalence of the optimal closure coefficients across different regimes, whether the 
augmented model outperforms the baseline model also depends on the structure of the for-
mer’s constitutive law. In fact, the reference model may perform well on test cases because 
of error cancellation effects, e.g., the linear closure was designed to compensate for the 
effects of the missing higher-order closure terms. Considering that nonlinear closure terms 
are also known to decrease the model’s stability (Apsley et  al. 1997; Bauer et  al. 2000; 
Weinmann and Sandberg 2009), however, their inclusion into the model may only come 
at the cost of stronger regularization of the closure coefficients. This may accordingly pre-
vent learning the optimal coefficient distributions for the training case and lead to a loss of 
accuracy. We, therefore, recommend to pursue similar generalization studies for nonlinear 
closure models.

6.3.2  Interaction of the Closure Coefficients

During the development of the neuralSST model, we used the two-step approach towards 
a priori closure modeling (Mandler and Weigand 2024), i.e., we first extracted the optimal 
coefficient distributions and then regressed them individually. Hence, it is not guaranteed 
that the learned coefficient functions interact favorably. While the optimal distributions of 
the coefficients in the constitutive law and those in the scale equations naturally match, 
errors in the regression step might lead to inconsistencies. These would, however, manifest 
themselves in erroneous predictions of flow field for the training case. If the interplay of 
the coefficients is not considered automatically, for instance, by regressing all coefficients 
simultaneously (Taghizadeh et al. 2020; Liu et al. 2024), this needs to be done manually. 
In either way, a meaningful generalization study requires a model that performs well on its 
training case which necessitates a favorable coefficient interaction. If that is ascertained, 
we do not expect any major differences in the generalization behavior depending on how 
this was achieved.

6.3.3  Dynamical System Behavior

The present investigation is concerned with the generalization capabilities of models that 
were optimized in an a priori fashion, i.e., they are supposed to reproduce the high-fidelity 
RST. Although we expect such models to perform well when they are implemented into a 
RANS solver, there is no such guarantee (Meneveau 1994a). Even the slightest errors can 
cause substantial a posteriori errors (Poroseva et al. 2016; Thompson et al. 2016), which 


1085Flow, Turbulence and Combustion (2025) 115:1059–1094 

is irrespective of their origin, that is whether they stem from the data itself, the inver-
sion procedure, or the regression step. So if the behavior of the solver is not automatically 
accounted for in the training, e.g., by CFD-integrated training (Holland et al. 2019; Zhao 
et  al. 2020; Novati et  al. 2021), this must be done manually in order to ensure that the 
model does indeed produce accurate results on the training case.

Taking that for grated, though, we think that the same generalization limits apply to 
models regardless whether they were trained in an a priori or an a posteriori sense. The 
set of coefficient functions that ensures optimal a posteriori results on one case is most 
likely detrimental for phenomenologically different flows. This is exemplified by the lim-
ited generalization capabilities of traditional closure models. Most of them were calibrated 
such that they accurately predict the decay rate of homogeneous, isotropic turbulence, the 
spreading rate of free shear layers, the log-law in equilibrium boundary layers and the 
propagation speed of the edge of the turbulent region (Pope 2000). However, many of those 
models fail, for instance, to accurately predict separating and reattaching boundary layers 
(Jakirlic 2012). So it appears that the coefficients required for partially separated boundary 
layers differ from those considered optimal for the calibration cases.

6.3.4  Enforcing Traditional Calibration Constraints

Spalart (2023) demands that data-driven models must not perform worse than the afore-
mentioned traditional calibration cases as the classical models do. Applying the same con-
straints on the closure coefficients (Taghizadeh et al. 2020; Liu et al. 2024) would ensure 
this and, on the one hand, potentially enhance the model’s generalization capability. On the 
other hand, the effectively limited capacity of the model may not suffice to learn the opti-
mal coefficient functions from the training case while adhering to the most likely conflict-
ing constraints. This would be a consequence of the widely accepted optimal coefficients’ 
non-uniqueness. Consequently, such a procedure could lead to the same strong, unfavorable 
regularization effect that was observed in multicase training (Geneva and Zabaras 2019; 
Lav et  al. 2019; Hammond et  al. 2020; Waschkowski et  al. 2022; Lav et  al. 2023; Fang 
et al. 2023), which ultimately reduces the model’s accuracy on its actual training case.

Since the closure coefficients are allowed to vary across the domain, it furthermore 
remains to be clarified, how it can be assured that those constraints are only applied in 
their respective regime of validity. In constrast to models with constant coefficients, the 
predicted coefficients in an equilibrium boundary layer, for example, may not necessar-
ily comply with the requirements of homogeneous, isotropic turbulence and vice versa. 
Enforcing these constraints everywhere, though, would thus overly regularize the model.

6.3.5  Zonal and Mixture Models

When we accept the non-existence of a single stable model that reproduces the optimal clo-
sure coefficients for many different flow phenomena, we might use a combination of mul-
tiple models where each of them is applied to its regime of validity. This idea of zonal and 
mixture models, see Eq.  (14), was first introduced in traditional closure modeling (Rodi 
1991; Menter 1994b) and has recently been adopted to data-driven modeling. Each expert 
model can, therefore, be trained for one specific phenomenon, which has been shown to 
work very well. The compositional model does consequently not suffer from the regulari-
zation effect of competing training objectives. Although promising results were achieved 


1086 Flow, Turbulence and Combustion (2025) 115:1059–1094

on simple test cases, the demonstration of superior generalization capabilities of mixture 
models in complex applications is still pending. To this end, several questions are in order.

First, based on which rationale should we split the regimes for which expert models are 
trained? As the different expert models should resolve the ambiguity of the optimal closure 
coefficients, the activations should be functions of both the model’s feature space and its 
corresponding target variable. As ground truth information for the latter are, however, not 
available for arbitrary test cases, this can only be approximated. Haghiri et al. (2020) and 
Zhao et al. (2020) thus propose to divide the domain in the physical space based on a priori 
knowledge about the occurring flow phenomena. Weatheritt and Sandberg (2016); Akole-
kar et al. (2018); Lav et al. (2019); Geneva and Zabaras (2019); Zhao et al. (2020); Akole-
kar et al. (2021); Frey Marioni et al. (2021) and Man et al. (2023) use marker functions 
for specific phenomena that automatically detect those regions. In contrast to that, Sriv-
astava and Duraisamy (2021), apply a manual discretization to the model’s feature space. 
Likewise, Edeling et al. (2014); Lozano-Durán and Bae (2023); Cherroud et al. (2023) and 
de Zordo-Banliat et al. (2024) automatically discretize the feature space by computing the 
similarity of a test sample with the set of training samples. This is in close analogy to 
the proposal of Parish and Duraisamy (2016); Wu et al. (2018); Srivastava and Duraisamy 
(2021); Kurz and Beck (2021) and Frey Marioni et al. (2021) to make each model’s con-
tribution to the prediction dependent on its local extrapolation extent, which is ultimately a 
result of the chosen feature space.

Secondly, how many different expert models should the mixture model comprise? The 
answer to this question is clearly linked to how many different functions are required to 
resolve all or at least the majority of ambiguities in the optimal closure coefficient dis-
tributions on the test case. While this may approximated by the number of different flow 
phenomena or regimes that we expect, it is questionable whether such a discretization is 
actually applicable as there continuous transitions from one into another regime.

Thirdly, how does the potentially drastic variation of the closure model definition across 
the domain affects its stability? Ling (2015) and Lav et al. (2019) emphasize the impor-
tance of smooth blending or activation functions. Loosely speaking, it remains question-
able, though, how smooth these functions have to be.

7  Conclusion

In the present paper, we explain and illustrate the inherently limited generalization capabili-
ties of data-driven EVMs. To this end, we performed a systematic generalization study with 
the neuralSST model, a data-driven LEVM trained for two-dimensional channel flows with 
boundary layer separation and reattachment. This model outperforms the SST model on other 
phenomenologically similar flows irrespective of the Reynolds number and the specific shape 
of the separation-inducing obstacle. While it also works well for attached channel flows, it 
generalizes poorly to stagnation flows and flows with strong streamline curvature. Applying 
this model to a ribbed two-pass cooling channel, which is governed by many different flow 
regimes, reveals that the local accuracy gains in the regions which are similar to the training 
case are overcompensated by the local losses elsewhere. Generalization to different flow phe-
nomena is apparently significantly more challenging than generalization to different Reynolds 
numbers or geometries. We demonstrated that this is not necessarily due to lacking support 
by training samples with a similar feature vector, but mainly a consequence of the different 


1087Flow, Turbulence and Combustion (2025) 115:1059–1094 

model response being required. This is known as a data-mismatch problem and arises from the 
specific choice of the feature set and the training case. Optimizing a model for a specific class 
of flows, therefore, decreases its generalization capabilities. Optimizing a model such that it 
accurately predicts a wide spectrum of flow regimes simultaneously, however, would require 
highly complex models suffering from severe instabilities. Our findings support existing con-
cerns that a universal EVM exists (Durbin and Pettersson Reif 2011). One should rather opt 
for niche models tailored for specific phenomena (Frey Marioni et al. 2021; Rumsey and Cole-
man 2022; Cherroud et  al. 2022). When optimizing models for complex multi-phenomena 
applications, this implies that the different flow regimes need to be prioritized and one should 
not expect the same level of global accuracy gains as often reported for academic test cases. 
We have demonstrated that such a prioritization may, on the one hand, be realized by virtue 
of selecting the training data only from a specific flow phenomenon. On the other hand, CFD-
driven training allows to embed this prioritization by defining a specific quantity of interest in 
a particular region of the domain as optimization target. Aiming for data-driven models that 
perform well in a wide spectrum of applications, we believe that zonal and mixture models are 
the most promising approach.

Appendix A SST model

In this appendix, we briefly recall the formulation of the SST model (Menter et al. 2003). As a 
LEVM, the constitutive law for the specific RST reads

where the eddy viscosity is defined as

and k and � are determined by solving the subsequent set of scale equations.

These contain the cross-diffusion term

and a production limiter

The blending functions F1 and F2 are defined as 

(A1)−u�u� = −
2

3
kI + 2�SST

t
S,

(A2)�SST
t

=
a1k

max
�
a1�,

√
2F2

���S
���
�

(A3)RSST
c

=

(
Dt(k) − P∗

k
+ �∗k� − � ⋅

[(
� + �k�

SST
t

)
�k

]

Dt(�) −
�P∗

k

�SST
t

+ ��2 − � ⋅

[(
� + ���

SST
t

)
��

]
− (1 − F1)CD

)
= 0

(A4)CD = 2�k−�
�

�k ⋅ ��
�

(A5)P∗
k
= min

(
2�SST

t

‖‖‖S
‖‖‖
2

, 10�∗k�

)
.

(A6a)F1 = tanh
(
arg4

1

)


1088 Flow, Turbulence and Combustion (2025) 115:1059–1094

 with their respective arguments

Apart from the two constants �∗ = 0.09 and a1 = 0.31 , the remaining coefficients 
g ∈

{
�, � , �k, ��

}
 are continuously blended between their respective values in the k-� and 

k-� models, which are given in Table 3.

We may thus interpret the SST model as a mixture model with continuous activation func-
tions which were derived from physical reasoning or as a model with variable closure coef-
ficients. By comparing Eqs. (A1) and (2), we find that the eddy viscosity limiter defined by 
Eq. (A2) in fact implies a variable closure coefficient

using the feature set QSST =
{
Ret, arg2, �

‖‖‖S
‖‖‖
}

.

Acknowledgements The investigations were conducted as part of the joint research program Roboflex in 
the frame of AG Turbo. The work was supported by the Bundesministerium für Wirtschaft und Energie 
(BMWE) as per resolution of the German Federal Parliament under grant number 03EE5013C. The authors 
gratefully acknowledge AG Turbo and MTU Aero Engines AG for their support and the permission to pub-
lish this paper. In particular, the authors would like to acknowledge the fruitful discussions with Mr. Pin-
chaud (MTU). Additionally, the authors kindly acknowledge the cooperation with the Cluster of Excellence 
SimTech (project number: 390740016). The responsibility for the content lies solely with its authors.

Author Contributions Conceptualization: [Hannes Mandler, Bernhard Weigand]; Methodology: [Hannes 
Mandler]; Formal analysis and investigation: [Hannes Mandler]; Writing - original draft preparation: 
[Hannes Mandler]; Writing - review and editing: [Bernhard Weigand]; Funding acquisition: [Bernhard Wei-
gand]; Supervision: [Bernhard Weigand]

Funding Open Access funding enabled and organized by Projekt DEAL. Open Access funding enabled and 
organized by Projekt DEAL.

Data Availability The data that support the findings of this study are available from the corresponding 
author upon reasonable request

(A6b)F2 = tanh
(
arg2

2

)

(A7)arg1 = min

�
min

�
max

� √
k

�∗�y
,
500�

y2�

�
,

4�k−�
� k

max
�
CD, 10−10

�
y2

�
, 10

�

(A8)arg2 = min

�
max

�
2
√
k

�∗�y
,
500�

y2�

�
, 100

�
.

(A9)g = F1g
k-� + (1 − F1)g

k-�

(A10)
gSST
1

=
2�∗

max

�
max

�
1, 1.8Re

−
1

2

t

�
,

√
2�∗

a1
tanh

�
arg2

2

�
�
���S

���

� ≤ 2�∗

Table 3  Closure coefficients of 
the k-� and k-� models

Model � � �k ��

k-� 0.0750 0.5556 0.85 0.500
k-� 0.0828 0.44 1.00 0.856


1089Flow, Turbulence and Combustion (2025) 115:1059–1094 

Declarations 

Conflict of interest The authors have no Conflict of interest to declare that are relevant to the content of this 
article.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, 
which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long 
as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Com-
mons licence, and indicate if changes were made. The images or other third party material in this article 
are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the 
material. If material is not included in the article’s Creative Commons licence and your intended use is not 
permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly 
from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

References

Apsley, D., Chen, W.-L., Leschziner, M., Lien, F.-S.: Non-linear eddy-viscosity modelling of separated 
flows. J. Hydraul. Res. 35(6), 723–748 (1997). https:// doi. org/ 10. 1080/ 00221 68970 94983 86

Amarloo, A., Forooghi, P., Abkar, M.: Frozen propagation of Reynolds force vector from high-fidelity data 
into Reynolds-averaged simulations of secondary flows. Phys. Fluids 34(11), 115102 (2022). https:// 
doi. org/ 10. 1063/5. 01232 31

Abe, H., Kawamura, H., Matsuo, Y.: Direct numerical simulation of a fully developed turbulent channel 
flow with respect to the Reynolds number dependence. J. Fluids Eng. 123(2), 382–393 (2001). https:// 
doi. org/ 10. 1115/1. 13666 80

Alfonsi, G.: Reynolds-averaged Navier-Stokes equations for turbulence modeling. Appl. Mech. Rev. 62(4), 
040802 (2009). https:// doi. org/ 10. 1115/1. 31246 48

Akolekar, H.D., Weatheritt, J., Hutchins, N., Sandberg, R.D., Laskowski, G., Michelassi, V.: Development 
and use of machine-learnt algebraic reynolds stress models for enhanced prediction of wake mixing 
in LPTs. Turbo Expo: Power for Land, Sea, and Air, vol. Volume 2C: Turbomachinery, pp. 02–42009 
(2018). https:// doi. org/ 10. 1115/ GT2018- 75447

Akolekar, H.D., Zhao, Y., Sandberg, R.D., Pacciani, R.: Integration of machine learning and computational 
fluid dynamics to develop turbulence models for improved low-pressure turbine wake mixing predic-
tion. J. Turbomach. 143(12), 121001 (2021). https:// doi. org/ 10. 1115/1. 40514 17

Bush, R.H., Chyczewski, T.S., Duraisamy, K., Eisfeld, B., Rumsey, C.L., Smith, B.R.: Recommendations 
for future efforts in RANS modeling and simulation. In: AIAA Scitech 2019 Forum. https:// doi. org/ 
10. 2514/6. 2019- 0317

Bauer, W., Haag, O., Hennecke, D.K.: Accuracy and robustness of nonlinear eddy viscosity models. Int. J. 
Heat Fluid Flow 21(3), 312–319 (2000). https:// doi. org/ 10. 1016/ S0142- 727X(00) 00015-1

Basara, B., Jakirlic, S.: A new hybrid turbulence modelling strategy for industrial CFD. Int. J. Numer. Meth-
ods Fluids 42, 89–116 (2003)

Beck, A., Kurz, M.: A perspective on machine learning methods in turbulence modeling. GAMM-Mitteilun-
gen 44(1), 202100002 (2021). https:// doi. org/ 10. 1002/ gamm. 20210 0002

Boussinesq, J.: Essai sur la théorie des eaux courantes. Mémoires présentés par divers savants à l’Académie 
des Sciences XXIII, 1–680 (1877)

Breuer, M., Peller, N., Rapp, C., Manhart, M.: Flow over periodic hills—numerical and experimental study 
in a wide range of Reynolds numbers. Comput. Fluids 238(2), 433–457 (2009). https:// doi. org/ 10. 
1016/j. compfl uid. 2008. 05. 002

Baughn, J.W., Shimizu, S.: Heat transfer measurements from a surface with uniform heat flux and an 
impinging jet. J. Heat Transfer 111(4), 1096–1098 (1989). https:// doi. org/ 10. 1115/1. 32507 76

Cooper, D., Jackson, D.C., Launder, B.E., Liao, G.X.: Impinging jet studies for turbulence model assess-
ment-I. Flow-field experiments. Int. J. Heat Mass Transfer 36(10), 2675–2684 (1993). https:// doi. org/ 
10. 1016/ S0017- 9310(05) 80204-2

Craft, T.J., Launder, B.E., Suga, K.: Development and application of a cubic eddy-viscosity model of turbu-
lence. Int. J. Heat Fluid Flow 17(2), 108–115 (1996). https:// doi. org/ 10. 1016/ 0142- 727X(95) 00079-6

http://creativecommons.org/licenses/by/4.0/
https://doi.org/10.1080/00221689709498386
https://doi.org/10.1063/5.0123231
https://doi.org/10.1063/5.0123231
https://doi.org/10.1115/1.1366680
https://doi.org/10.1115/1.1366680
https://doi.org/10.1115/1.3124648
https://doi.org/10.1115/GT2018-75447
https://doi.org/10.1115/1.4051417
https://doi.org/10.2514/6.2019-0317
https://doi.org/10.2514/6.2019-0317
https://doi.org/10.1016/S0142-727X(00)00015-1
https://doi.org/10.1002/gamm.202100002
https://doi.org/10.1016/j.compfluid.2008.05.002
https://doi.org/10.1016/j.compfluid.2008.05.002
https://doi.org/10.1115/1.3250776
https://doi.org/10.1016/S0017-9310(05)80204-2
https://doi.org/10.1016/S0017-9310(05)80204-2
https://doi.org/10.1016/0142-727X(95)00079-6


1090 Flow, Turbulence and Combustion (2025) 115:1059–1094

Cherroud, S., Merle, X., Cinnella, P., Gloerfelt, X.: Sparse Bayesian learning of explicit algebraic Reynolds-
stress models for turbulent separated flows. Int. J. Heat Fluid Flow 98, 109047 (2022). https:// doi. org/ 
10. 1016/j. ijhea tflui dflow. 2022. 109047

Cherroud, S., Merle, X., Cinnella, P., Gloerfelt, X.: Space-dependent aggregation of data-driven turbulence 
models (2023). arXiv: 2306. 16996 v1

de Zordo-Banliat, M., Dergham, G., Merle, X., Cinnella, P.: Space-dependent turbulence model aggrega-
tion using machine learning. J. Comp. Phys. 497, 112628 (2024). https:// doi. org/ 10. 1016/j. jcp. 2023. 
112628

Dietz, C.F., Neumann, S.O., Weigand, B.: A comparative study of the performance of explicit algebraic 
models for the turbulent heat flux. Numer. Heat Transf. Part A Appl. 52(2), 101–126 (2007). https:// 
doi. org/ 10. 1080/ 10407 78060 11150 46

Durbin, P.A., Pettersson Reif, B.A.: Statistical Theory and Modeling for Turbulent Flows, 2nd edn. John 
Wiley & Sons, Ltd, Chichester (2011). https:// doi. org/ 10. 1002/ 97804 70972 076

Doormaal, J.P.V., Raithby, G.D.: Enhancements of the simple method for predicting incompressible fluid 
flows. Numer. Heat Transf. 7(2), 147–163 (1984). https:// doi. org/ 10. 1080/ 01495 72840 89618 17

Durbin, P.A.: Near-wall turbulence closure modeling without “damping functions’’. Theor. Comput. Fluid 
Dyn. 3(1), 1–13 (1991). https:// doi. org/ 10. 1007/ BF002 71513

Duraisamy, K.: Perspectives on machine learning-augmented Reynolds-averaged and large eddy simulation 
models of turbulence. Phys. Rev. Fluids 6, 050504 (2021). https:// doi. org/ 10. 1103/ PhysR evFlu ids.6. 
050504

Edeling, W.N., Cinnella, P., Dwight, R.P.: Predictive RANS simulations via Bayesian model-scenario aver-
aging. J. Comp. Phys. 275, 65–91 (2014). https:// doi. org/ 10. 1016/j. jcp. 2014. 06. 052

Elfert, M., Schroll, M., Förster, W.: PIV measurement of secondary flow in a rotating two-pass cooling sys-
tem with an improved sequencer technique. J. Turbomach. 134(3), 031001 (2011). https:// doi. org/ 10. 
1115/1. 40032 22

Ellis, C.D., Xia, H.: Data-driven turbulence anisotropy in film and effusion cooling flows. Phys. Fluids 
35(10), 105114 (2023). https:// doi. org/ 10. 1063/5. 01666 85

Fischer, L., James, D., Jeyaseelan, S., Pfitzner, M.: Optimizing the trench shaped film cooling design. Int J. 
Heat Mass Transf. 214, 124399 (2023). https:// doi. org/ 10. 1016/j. ijhea tmass trans fer. 2023. 124399

Fiore, M., Koloszar, L., Fare, C., Mendez, M.A., Duponcheel, M., Bartosiewicz, Y.: Physics-constrained 
machine learning for thermal turbulence modelling at low Prandtl numbers. Int. J. Heat Mass 
Transf. 194, 122998 (2022). https:// doi. org/ 10. 1016/j. ijhea tmass trans fer. 2022. 122998

Frey Marioni, Y., Toledo Ortiz, E.A., Cassinelli, A., Montomoli, F., Adami, P., Vazquez, R.: A machine 
learning approach to improve turbulence modelling from DNS data using neural networks. Int. J. 
Turbomach. Propul. Power 6(2), 17 (2021). https:// doi. org/ 10. 3390/ ijtpp 60200 17

Fang, Y., Zhao, Y., Waschkowski, F., Ooi, A.S.H., Sandberg, R.D.: Toward more general turbulence 
models via multicase computational-fluid-dynamics-driven training. AIAA J. 61(5), 2100–2115 
(2023). https:// doi. org/ 10. 2514/1. J0625 72

Glushko, G.S.: Turbulent boundary layer on a flat plate in an incompressible fluid. NASA TT F-10080, 
translation of "Turbulentnyy pogranichnyy sloy na ploskoy plastine v neszhimayemoy zhidkosti". 
Izvestiya Akademii Nauk SSSR, Seriya Mekhanika 4, 13-23 (1965)

Geneva, N., Zabaras, N.: Quantifying model form uncertainty in Reynolds-averaged turbulence models 
with Bayesian deep neural networks. J. Comput. Phys. 383, 125–147 (2019). https:// doi. org/ 10. 
1016/j. jcp. 2019. 01. 021

Han, J.-C.: Recent studies in turbine blade cooling. Int. J. Rot. Mach. 10(6), 517231 (2004). https:// doi. 
org/ 10. 1155/ S1023 621X0 40004 42

Holland, J.R., Baeder, J.D., Duraisamy, K.: Towards integrated field inversion and machine learning 
with embedded neural networks for RANS modeling. AIAA Scitech 2019 Forum, San Diego, Cal-
ifornia, United States (2019). https:// doi. org/ 10. 2514/6. 2019- 1884. https:// arc. aiaa. org/ doi/ abs/ 10. 
2514/6. 2019- 1884

Huijing, J.P., Dwight, R.P., Schmelzer, M.: Data-driven RANS closures for three-dimensional flows 
around bluff bodies. Comput. Fluids 225, 104997 (2021). https:// doi. org/ 10. 1016/j. compfl uid. 
2021. 104997

Hammond, J., Frey Marioni, Y., Montomoli, F.: Error quantification for the assessment of data-
driven turbulence models. Flow Turbul. Combust. 109(1), 1–26 (2022). https:// doi. org/ 10. 1007/ 
s10494- 022- 00321-1

Haghiri, A., Lav, C., Sandberg, R.: Data-driven turbulence modelling for improved prediction of ship 
Airwakes. In: 33rd Symposium on Naval Hydrodynamics, Osaka, Japan (2020)

Hammond, J., Montomoli, F., Pietropaoli, M., Sandberg, R.D., Michelassi, V.: Machine Learning for the 
Development of Data Driven Turbulence Closures in Coolant Systems. Turbo Expo: Power for 

https://doi.org/10.1016/j.ijheatfluidflow.2022.109047
https://doi.org/10.1016/j.ijheatfluidflow.2022.109047
http://arxiv.org/abs/2306.16996v1
https://doi.org/10.1016/j.jcp.2023.112628
https://doi.org/10.1016/j.jcp.2023.112628
https://doi.org/10.1080/10407780601115046
https://doi.org/10.1080/10407780601115046
https://doi.org/10.1002/9780470972076
https://doi.org/10.1080/01495728408961817
https://doi.org/10.1007/BF00271513
https://doi.org/10.1103/PhysRevFluids.6.050504
https://doi.org/10.1103/PhysRevFluids.6.050504
https://doi.org/10.1016/j.jcp.2014.06.052
https://doi.org/10.1115/1.4003222
https://doi.org/10.1115/1.4003222
https://doi.org/10.1063/5.0166685
https://doi.org/10.1016/j.ijheatmasstransfer.2023.124399
https://doi.org/10.1016/j.ijheatmasstransfer.2022.122998
https://doi.org/10.3390/ijtpp6020017
https://doi.org/10.2514/1.J062572
https://doi.org/10.1016/j.jcp.2019.01.021
https://doi.org/10.1016/j.jcp.2019.01.021
https://doi.org/10.1155/S1023621X04000442
https://doi.org/10.1155/S1023621X04000442
https://doi.org/10.2514/6.2019-1884
https://arc.aiaa.org/doi/abs/10.2514/6.2019-1884
https://arc.aiaa.org/doi/abs/10.2514/6.2019-1884
https://doi.org/10.1016/j.compfluid.2021.104997
https://doi.org/10.1016/j.compfluid.2021.104997
https://doi.org/10.1007/s10494-022-00321-1
https://doi.org/10.1007/s10494-022-00321-1


1091Flow, Turbulence and Combustion (2025) 115:1059–1094 

Land, Sea, and Air, vol. Volume 7A: Heat Transfer, pp. 07–15026 (2020). https:// doi. org/ 10. 1115/ 
GT2020- 15928

Hammond, J., Pietropaoli, M., Montomoli, F.: Robust data-driven turbulence closures for improved heat 
transfer prediction in complex geometries. Int. J. Heat Fluid Flow 98, 109072 (2022). https:// doi. 
org/ 10. 1016/j. ijhea tflui dflow. 2022. 109072

Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning, 2nd edn. Springer, New 
York (2017). https:// doi. org/ 10. 1007/ 978-0- 387- 84858-7

Illowsky, B., Dean, S.: Introductory Statistics. OpenStax, Houston (2018)
Jakirlic, S.: Extended excerpt related to the test case: Flow over a periodical arrangement of 2D hills. 

Technical Report ACP8-GA-2009-233710-ATAAC, TU Darmstadt. Final report on "Assessment 
of the RSM, URANS and hybrid models with respect to the different roadmaps including the 
industrial application challenges" (2012)

Jiang, C., Vinuesa, R., Chen, R., Mi, J., Laima, S., Li, H.: An interpretable framework of data-driven 
turbulence modeling using deep neural networks. Phys. Fluids 33(5), 055133 (2021). https:// doi. 
org/ 10. 1063/5. 00489 09

Kurz, M., Beck, A.: Investigating model-data inconsistency in data-informed turbulence closure terms. 
WCCM-ECCOMAS2020, virtual (2021). https:// doi. org/ 10. 23967/ wccm- eccom as. 2020. 115

Kaandorp, M.L.A., Dwight, R.P.: Data-driven modelling of the Reynolds stress tensor using random forests 
with invariance. Comp. Fluids 202, 104497 (2020). https:// doi. org/ 10. 1016/j. compfl uid. 2020. 104497

Lav, C., Banko, A.J., Waschkowski, F., Zhao, Y., Elkins, C.J., Eaton, J.K., Sandberg, R.D.: A coupled 
framework for symbolic turbulence models from deep-learning. Int. J. Heat Fluid Flow 101, 
109140 (2023). https:// doi. org/ 10. 1016/j. ijhea tflui dflow. 2023. 109140

Lozano-Durán, A., Bae, H.J.: Machine learning b