Vol.:(0123456789)

Theoretical and Applied Genetics (2024) 137:104 
https://doi.org/10.1007/s00122-024-04592-2

ORIGINAL ARTICLE

Optimizing selection based on BLUPs or BLUEs in multiple sets 
of genotypes differing in their population parameters

Albrecht E. Melchinger1,2  · Rohan Fernando3  · Andreas J. Melchinger4 · Chris‑Carolin Schön1 

Received: 27 October 2023 / Accepted: 5 March 2024 / Published online: 15 April 2024 
© The Author(s) 2024

Abstract
Key message Selection response in truncation selection across multiple sets of candidates hinges on their post-selection 
proportions, which can deviate grossly from their initial proportions. For BLUPs, using a uniform threshold for all 
candidates maximizes the selection response, irrespective of differences in population parameters.
Abstract Plant breeding programs typically involve multiple families from either the same or different populations, varying 
in means, genetic variances and prediction accuracy of BLUPs or BLUEs for true genetic values (TGVs) of candidates. We 
extend the classical breeder's equation for truncation selection from single to multiple sets of genotypes, indicating that the 
expected overall selection response (ΔG

Tot
) for TGVs depends on the selection response within individual sets and their 

post-selection proportions. For BLUEs, we show that maximizing ΔG
Tot

 requires thresholds optimally tailored for each set, 
contingent on their population parameters. For BLUPs, we prove that ΔG

Tot
 is maximized by applying a uniform threshold 

across all candidates from all sets. We provide explicit formulas for the origin of the selected candidates from different sets 
and show that their proportions before and after selection can differ substantially, especially for sets with inferior properties 
and low proportion. We discuss implications of these results for (a) optimum allocation of resources to training and predic-
tion sets and (b) the need to counteract narrowing the genetic variation under genomic selection. For genomic selection of 
hybrids based on BLUPs of GCA of their parent lines, selecting distinct proportions in the two parent populations can be 
advantageous, if these differ substantially in the variance and/or prediction accuracy of GCA. Our study sheds light on the 
complex interplay of selection thresholds and population parameters for the selection response in plant breeding programs, 
offering insights into the effective resource management and prudent application of genomic selection for improved crop 
development.

Introduction

Selection is one of the major drivers of evolution and breed-
ing. In nature, various types of selection occur, which are 
studied in evolutionary biology and described in textbooks 
on population genetics (e.g., Hartl et al. 1997). In breeding, 
directional selection is by far the most important type of 
selection in the sense that breeders typically select only a 
certain number or proportion of top candidates for a single 
trait or an index of the most important traits. The selected 
candidates are then advanced for further breeding or utilized 
as experimental cultivars for commercial purposes.

Cochran (1951) derived the primary mathematical results 
for the changes in population parameters under truncation 
selection in a seminal paper and demonstrated its application 
to plant selection. He described the selection response for a 
target variable, when selection is based on correlated vari-
ates. Cochran’s formula and its extension to the peculiarities 

Communicated by Antonio Augusto Franco Garcia.

 * Albrecht E. Melchinger 
 albrechtmelchinger@gmail.com

1 Plant Breeding, TUM School of Life Sciences, Technical 
University of Munich, 85354 Freising, Germany

2 Institute of Plant Breeding, Seed Science and Population 
Genetics, University of Hohenheim, 70599 Stuttgart, 
Germany

3 Department of Animal Science, Iowa State University, Ames, 
IA 50011, USA

4 Department of Mathematics, University of Stuttgart, 
70569 Stuttgart, Germany

http://crossmark.crossref.org/dialog/?doi=10.1007/s00122-024-04592-2&domain=pdf
http://orcid.org/0000-0003-0810-873X
http://orcid.org/0000-0001-5821-099X
http://orcid.org/0000-0001-6067-7900


 Theoretical and Applied Genetics (2024) 137:104104 Page 2 of 18

in plant breeding, such as length of the breeding cycle and 
parental control, are known as the breeders’ equation (cf. 
Bernardo 2002; Lynch and Walsh 1998). This equation 
is one of the most important contributions of quantitative 
genetics to practical breeding as it quantifies the relevant 
factors that determine the progress expected from directional 
selection. However, the breeders’ equation strictly applies 
only to selection in a single population and assumes homo-
geneous correlation between the true genetic value (TGV) 
and the selection criterion (SC) for all candidates, which is 
generally not met in practice. More general settings, drop-
ping the latter assumption, were investigated by Bulmer 
(1980).

In animal breeding, the problem of heterogeneity of vari-
ances among sets was early addressed in the context of dif-
ferent environmental groups (Brotherstone and Hill 1986). 
Hill (1984) found that under more intense selection, more 
animals are selected from the group with larger variance 
and recommended to correct for heterogeneity. For selection 
based on BLUPs, Garrick and Van Vleck (1987) examined 
the case of heterogeneous variances and showed that selec-
tion assuming homogeneity is still highly efficient if the pre-
diction accuracy is high.

Plant breeding programs typically involve multiple sets 
of candidates from various families or populations (e.g., 
Auinger et al. 2021; Lian et al. 2014) and breeders often 
apply the same threshold to all candidates without consider-
ing their origin. However, if the sets differ in their mean and/
or genetic variance and/or heritability ( h2 ) of entry means, 
calculated as best linear unbiased estimates (BLUEs) in 
phenotypic selection, this may be suboptimal for the selec-
tion response of the entire program. This problem arises 
for example when one set of candidates is tested in more 
locations and/or years than another set, resulting in different 
heritabilities ( h2).

When selection is based on best linear unbiased predictors 
(BLUPs) calculated from pedigree or “omics” data, there are 
numerous cases in which candidates differ in their popula-
tion parameters, most notably the prediction accuracy (ρ) 
for the TGVs. In genomic selection, ρ strongly depends on 
the size of the training set and its relationship to the predic-
tion set (e.g., Auinger et al. 2021; Clark et al. 2012; Habier 
et al. 2007). As demonstrated by experimental studies and 
simulations, adding more half-sibs to full-sibs in the training 
set improves ρ for genomic prediction within full-sib fami-
lies (Brauner et al. 2019; Lehermeier et al. 2014; Lian et al. 
2014; Riedelsheimer et al. 2013). Additionally, if pedigree, 
genomic, metabolic, or transcriptomic data are collected for 
different sets, the prediction accuracy of BLUPs calculated 
from different "omics" features or combinations of them 
can vary significantly among candidates (Seifert et al. 2018; 
Westhues et al. 2017; Zenke-Philippi et al. 2017). The same 
holds true for recently proposed approaches of phenomic 

selection based on sensor data and NIRS measurements 
(Robert et al. 2022; Weiß et al. 2022). Thus, breeders should 
be aware of the consequences of different prediction accura-
cies for the composition of the selected candidates.

A related, albeit slightly distinct scenario unfolds in 
hybrid breeding. Typically, lines from two genetically distant 
parent populations are selected based on predictors of their 
general combining ability (GCA) to attain a high selection 
response in the predicted hybrids (Melchinger et al. 2023). 
In general, breeders select an equal proportion of lines from 
each parent population for producing a factorial of hybrids 
among them (Melchinger and Posselt 2013). However, this 
approach may not be optimal if the two parent populations 
differ in their GCA variances and/or prediction accuracy for 
GCA effects. To our knowledge, no research has addressed 
the determination of the optimal proportion of lines to be 
selected from each parent population under this scenario.

The main objective of this study was to quantify and ana-
lyze the expected selection response when applying trunca-
tion selection to candidates from two sets differing in their 
population parameters. First, we extend Cochran’s formula 
for determining how the selection response in the combined 
set and the composition of the selected fraction depends on 
the proportion and selection response of the individual sets. 
Second, we derive solutions to determine the threshold, or 
equivalently the selected proportion, in each set to maximize 
the selection response in the combined set and examine the 
implications for selection based on BLUPs or BLUEs. Third, 
we explore how to optimize the selection response in hybrid 
breeding if the female and male parent lines of a complete 
factorial are selected based on their predicted GCA and the 
two parent populations differ in the variance and/or predic-
tion accuracy of GCA. We augment our theoretical findings 
with numerical calculations that assess the benefits of uti-
lizing optimal selected proportions and their impact on the 
composition of the selected set.

Theory

The results in this section are given for two sets of genotypes 
Π1 and Π2 that can originate from the same or different popu-
lations, but they can be extended to any number of sets. The 
two disjoint sets Π1 and Π2 can be of unequal size with pro-
portions �1 and �2 = 1 − �1 , respectively, in the combined 
set Π1 ∪ Π2 . We assume that the SC for the candidates from 
Π1 or Π2 is identically independently distributed according 
to normal distributions N

(
�1, �

2
1

)
 and N

(
�2, �

2
2

)
 , respec-

tively. Under these assumptions, applying truncation selec-
tion with threshold t1 and t2 to the candidates in Π1 and Π2 
corresponds directly to selecting proportions �

(
t1−�1

�1

)
 and 

�
(

t2−�2

�2

)
 , respectively. Here, �(x) denotes the proportion 


Theoretical and Applied Genetics (2024) 137:104 Page 3 of 18 104

selected from a normal distribution N(0, 1) using threshold 
x, and i�(x) represents the corresponding selection intensity.

In order to simplify formulas, we will use the abbrevia-
tion � =

(
�1,�2, �1, �2,�1

)
 . Thus, we get for the proportion 

of candidates selected from Π1 ∪ Π2 using thresholds t1 and 
t2 (“Appendix 1,” Eq. 17)

and for the proportion of candidates from Γ1 and Γ2 in the 
selected fraction Γ1 ∪ Γ2 (“Appendix 1,” Eq. 18)

Assuming the regression coefficient of the SC on the 
TGV is b1 in Π1 and b2 in Π2 , and applying the breeders’ 
equation for each set, we get for the total selection response 
of TGVs under truncation selection in Π1 ∪ Π2 with thresh-
olds t1 and t2 (“Appendix 1,” Eq. 22)

w h e r e  ΔG1

((
t1 − �1

)
, �1, b1

)
= b1�1i�

(
t1−�1
�1

)  a n d 

ΔG2

((
t2 − �2

)
, �2, b2

)
= b2�2i�

(
t2−�2
�2

) refer to the selection 

response in set Π1 and Π2 , respectively.
A special situation exists in hybrid breeding with two 

genetically distant parent populations, where Π1 and Π2 
correspond to sets of lines from the seed or pollen parent 
population, respectively. The TGV refers to the general 
combining ability (GCA) of each line in cross-combina-
tions with the other parent population. Since GCA values 
are defined as deviations from the overall mean of the 
hybrid population Π1 × Π2 , we assume that the SC for the 
GCA of the lines from Π1 and Π2 follows normal distribu-
tions with N

(
0, �2

1

)
 and N

(
0, �2

2

)
 , respectively, and the 

regression coefficients of the TGV of GCA effects on the 
SC are b1 and b2 , respectively. In phenotypic selection, the 
SC is commonly based on the testcross performance of 
each line evaluated in crosses with one or several tester(s) 
from the opposite population of the heterotic pattern. In 
genomic selection, GCA can be predicted from the marker 
profile of the parent lines and phenotypic data of hybrids 
in a training set (cf. Bernardo 1996; Technow et al. 2014). 

(1)�Tot(t1, t2, �) = �

(
t1 − �1

�1

)
�1 + �

(
t2 − �2

�2

)
�2

(2)
�1
(

t1, t2, �
)

=
�
(

t1−�1
�1

)

�1

�Tot
(

t1, t2, �
) =

|

|

Γ1||
|

|

Γ1 ∪ Γ2||
and

�2
(

t1, t2, �
)

=
�
(

t2−�2
�2

)

�2

�Tot
(

t1, t2, �
) =

|

|

Γ2||
|

|

Γ1 ∪ Γ2||
= 1 − �1

(

t1, t2, �
)

(3)

ΔGTot
(

t1, t2, �, b1, b2
)

= ΔG1
((

t1 − �1
)

, �1, b1
)

�1
(

t1, t2, �
)

+ ΔG2
((

t2 − �2
)

, �2, b2
)

�2(t1, t2, �)
+ �1

[

�1
(

t1, t2, �
)

− �1
]

+ �2[�2(t1, t2, �) − �2]

The lines with highest predicted GCA effects in each par-
ent population are generally selected for producing a facto-
rial to be phenotyped in the final step of cultivar develop-
ment (Melchinger and Posselt 2013). Thus, the selection 
response ΔGHyb in the complete factorial Γ1 × Γ2 of 
hybrids, produced by mating set Γ1 of GCA-selected lines 
from Π1 with set Γ2 of GCA-selected lines from Π2 , com-
pared to the factorial Π1 × Π2 among unselected lines, is 
equal to the sum of the selection response for GCA effects 
ΔG1(t1, �1, b1) = b1�1i�

(
t1

�1

) plus ΔG2(t2, �2, b2) = b2�2i�
(

t2

�2

) 

in parent population Π1 and Π2 , respectively, and we have 
for � = (�1, �2, b1, b2)

where �( t1
�1
) =

|Γ1|
|Π1| and �( t2

�2
) =

|Γ2|
|Π2| is the proportion of 

selected lines in Π1 and Π2 , respectively. Note that

corresponds to the proportion of hybrids in Γ1 × Γ2 selected 
in silico from the set of all possible hybrids in Π1 × Π2.

Maximizing the total selection response by optimal 
choice of thresholds

Depending on the budget and size of the breeding pro-
gram, the breeder has restrictions on the total number of 
genotypes to be selected from the candidates in a given 
cycle. This applies irrespective of whether the selected 
candidates are promoted to further testing for cultivar 
development or recombined to generate new base material 
for the next breeding cycle in recurrent selection. There-
fore, the total proportion of selected candidates ( �T) is 
typically fixed. Nevertheless, the breeder still has the 
option to optimize the total selection response in Π1 ∪ Π2 
by selecting different proportions of candidates from Π1 
and Π2 , respectively, while keeping �Tot(t1, t2, �) , the total 
proportion of genotypes selected from Π1 ∪ Π2 , fixed. 
Thus, the goal is to find thresholds t∗

1
 and t∗

2
 , or equivalently 

selected proportions �∗
1
= �

(
t∗
1
−�1

�1

)
and �∗

2
= �

(
t∗
2
−�2

�2

)
 , 

which maximize the total  select ion response 
ΔGTot

(
t1, t2, �, b1, b2

)
 under the side condition �Tot

(
t1, t2, �

)
 

= �T.
A solution to this problem can be obtained by applying 

a Lagrange multiplier approach. Our derivations show that (
t∗
1
, t∗
2

)
 are obtained as solutions of the following equations 

in 
(
t1, t2

)
 (“Appendix 2,” Eqs. 30 and 31):

(4)ΔGHyb

(
t1, t2,�

)
= ΔG1(t1, �1, b1) + ΔG2(t2, �2, b2)

(5)

�Hyb
(
t1, t2,�

)
= �

(
t1

�1

)
× �

(
t2

�2

)
=

||Γ1
||

||Π1
||
×

||Γ2
||

||Π2
||
=

||Γ1 × Γ2
||

||Π1 × Π2
||


 Theoretical and Applied Genetics (2024) 137:104104 Page 4 of 18

and

Solutions 
(
t∗
1
, t∗
2

)
 of these equations can be obtained 

by mathematical software, such as Mathematica (Wolf-
ram 1999), and subsequently used to calculate �∗

1
 , �∗

2
 and 

ΔGTot

(
t∗
1
, t∗
2
, �, b1, b2

)
 . In order to assess the improvement 

in the total selection response, which can be achieved by 
applying optimal thresholds 

(
t∗
1
, t∗
2

)
 instead of identical 

thresholds ti
1
= ti

2
 for both sets satisfying the side condition 

�Tot
(
ti
1
, ti
2
, �
)
= �T in Eq. 7, we suggest using the ratio

In hybrid breeding, the breeder is also limited in terms 
of the number of promising predicted hybrids that can be 
evaluated in a factorial for product development in the next 
step of the breeding scheme. Thus, the goal is to find opti-
mal proportions �o

1
 and �o

2
 of candidates from Π1 and Π2, 

or equivalently optimal thresholds to
1
 and to

2
 , for selection 

in Π1 and Π2 , respectively, which maximize the selection 
response ΔGHyb

(
t1, t2,�

)
 for the factorial produced between 

the GCA-selected lines. However, instead of using Eq. 7, the 
side condition takes the form

where �H is the fixed proportion of hybrids to be selected for 
testing in the final stage of hybrid development.

A solution to this maximization problem can be 
found again by applying a Lagrange multiplier approach 
(“Appendix 3”). Accordingly, thresholds 

(
to
1
, to
2

)
 optimizing 

ΔGHyb

(
t1, t2,�

)
 in the factorial of hybrids among selected 

lines are found as solutions 
(
t1, t2

)
 (“Appendix 3,” Eqs. 43) 

of Eq. 9 and

Numerical solutions for 
(
to
1
, to
2

)
 can be obtained by math-

ematical software such as Mathematica and subsequently 
used to calculate the proportions �o

1
= �

(
to
1

�1

)
 and 

�o
2
= �

(
to
2

�2

)
= �H∕�

o
1
 to be selected in Π1 and Π2 , respec-

tively, and finally, ΔGHyb

(
to
1
, to
2
,�
)
.

(6)
t1 =

b2
(

t2 − �2
)

+ �2 − �1

b1
+ �1 or equivalently

t2 =
b1
(

t1 − �1
)

+ �1 − �2

b2
+ �2

(7)

�Tot
(
t1, t2, �

)
= �

(
t1 − �1

�1

)
�1 + �

(
t2 − �2

�2

)(
1 − �1

)
= �T.

(8)

ΨTot
(

�T, �, b1, b2
)

= 100 ×

[

ΔGTot
(

t∗1 , t
∗
2 , �, b1, b2

)

− ΔGTot
(

ti1, t
i
2, �, b1, b2

)

ΔGTot
(

ti, ti, �, b1, b2
)

]

.

(9)�Hyb
(
t1, t2,�

)
= �

(
t1

�1

)
× �

(
t2

�2

)
= �H,

(10)b2t2 − b1t1 + ΔG1

(
t1, �1, b1

)
− ΔG2

(
t2, �2, b2

)
= 0.

In order to assess the improvement in the total selection 
response, which can be achieved by using the optimal pro-
portions 

(
�o
1
, �o

2

)
 compared to selecting an equal proportion 

�e =
√
�H of lines from each population, i.e., using thresh-

olds te
1
= �1Φ

−1(1 − �e) and te
2
= �2Φ

−1(1 − �e) , we suggest 
using the ratio

Application to selection based on BLUPs

Let u denote the random variable of true breeding values 
(TBVs) and û their BLUPs, obtained by the use of pedigree 
or “omics” data. As shown by Henderson (1975), the stand-
ard deviation �u of TGVs and the standard deviation  � of 
their BLUPs are related by � = ��u , where � is the prediction 
accuracy, reflecting the shrinkage of BLUPs compared to the 
TBVs. Hence, we have �1 = �1�u1 and �2 = �2�u2 . Further, 
the regression of u on û is equal to 1.0 for each set, so that 
b1 = 1.0 and b2 = 1.0 and this result holds true under fairly 
general conditions (“Appendix 4”). Thus, from Eq. 6 we 
obtain t∗

1
= t∗

2
 , even if �1 ≠ �2 , �2

u1
≠ �2

u2
 , and �1 ≠ �2 . Con-

sequently, using identical thresholds for the predicted values 
of TGVs (calculated as BLUPs plus the mean � of the cor-
responding set) maximizes the selection response in the 
combined set. In conclusion, for BLUPs there is no need to 
search for the optimal threshold in each set and one must 
merely find the common threshold t∗ = t∗

1
= t∗

2
 for both sets 

satisfying the side condition in Eq. 7, which can be obtained 
by solving the equation

Moreover, the total selection response in the combined 
set Π1 ∪ Π2 for the common threshold t∗ is 

Application to selection based on BLUEs

In phenotypic selection (PS) based on BLUEs, the regres-
sion of TGVs on the SC is equal to their heritability (Fal-
coner and Mackay 1996, p. 189), so that b1 = h2

1
 and b2 = h2

2
 . 

Further, the standard deviation �u of TBVs and the standard 
deviation σ of their BLUEs used in PS are related by � =

�u
h

 . 

(11)

ΨHyb

(
�H,�

)
= 100 ×

[
ΔGHyb

(
to
1
, to
2
,�
)
− ΔGHyb

(
te
1
, te
2
,�
)

ΔGHyb

(
te
1
, te
2
,�
)

]
.

(12)Φ

(
t∗ − �1

�1

)
�1 + Φ

(
t∗ − �2

�2

)(
1 − �1

)
= 1 − �T

(13)

ΔGTot-BLUP
(

t∗, t∗�, 1, 1
)

= 1
�T

[�1�
(

t∗ − �1
�1

)

�1 + �2�
(

t∗ − �2
�2

)

�2

+ �1�1

(

�
(

t∗ − �1
�1

)

− �T

)

+ �2�2

(

�
(

t∗ − �2
�2

)

− �T

)

]


Theoretical and Applied Genetics (2024) 137:104 Page 5 of 18 104

Hence, we have �1 = �u1∕h1 and �2 = �u2∕h2 . Thus, Eq. 3 
becomes

From Eqs. 6 and 7, the optimal choice of thresholds t∗
1
 and 

t∗
2
 are obtained as solutions of

Numerical analyses

All equations in the theory part were programmed in soft-
ware Mathematica (Wolfram 1999) for numerical analyses. 
As a first check for Eq. 6 and the derivations in “Appen-
dix 2,” we numerically compared the selection response 
ΔGTot for BLUPs achieved with optimized thresholds ( t∗

1
 , 

t∗
2
) versus identical ( ti

1
= ti

2
) thresholds for BLUPs, setting 

b1 = b2 = 1.0 in our program. Regardless of the means 
(�1, �2) and standard deviations ( �1, �2) of the SC in Π1 
and Π2 , as well as the choice of �1 and �T, the value of ΔGTot 
obtained for ( t∗

1
 , t∗

2
) and ( ti

1
= ti

2
) were identical except for 

tiny differences attributable to rounding errors so that ΨTot 
was practically zero (data not shown), confirming our theo-
retical results.

For BLUEs, we calculated on one hand the values for 
ΨTot and �∗

1
= �1

(
t∗
1
, t∗
2
, �
)
 obtained by using the solutions 

for ( t∗
1
 , t∗

2
) obtained with the Lagrange multiplier approach 

(Eq. 6). On the other hand, we used Function NMaximize 
in Mathematica to determine the maximum of ΔGTot under 
the side condition in Eq. 7. Again, the numerical results 
from both calculations were in perfect agreement except for 
numerical inaccuracies.

For finding the maximum selection response ΔGHyb in the 
hybrid population Π1 × Π2, we used function NMaximize in 
Mathematica in combination with the side condition in Eq. 9 
to find the optimum choice of selected proportions(�o

1
, �o

2
) . 

These values we used to calculate according to Eq. 8 the 
percentage improvement ( �Hyb) in ΔGHyb when using opti-
mized (�o

1
, �o

2
) instead of equal 

(
�e
1
= �e

2

)
 proportions of lines 

selected from population Π1 andΠ2.
For investigating the consequences of BLUE-based selec-

tion on the magnitude ofΨTot , γ∗1 and � i
1
= �1

(
ti
1
, ti
2
, �
)
 as a 

function of other relevant population parameters, we made 

(14)

ΔGTot - PS
(

t1, t2, �, h21, h
2
2
)

= h21�1�
(

t1 − �1
�1

)

�1 + h22�2�
(

t2 − �2
�2

)

�2

+ �1
[

�1
(

t1, t2, �
)

− �1
]

+ �2
[

�2
(

t1, t2, �
)

− �2
]

(15)

Φ
( t∗1 − �

�1

)

�1 + Φ

(

h21
(

t∗1 − �1
)

+ �1 − �2

h22�2

)

�2 = 1 − �T and

t∗2 =
h21
(

t∗1 − �1
)

+ �1 − �2

h22
+ �2

the assumption without loss of generality that �1 = 0 and 
�u1 = 1.0 . This can be achieved by centering the original SC 
values of all candidates as deviations from �1 and dividing 
them by �1 = �u1∕h1 . Moreover, for representing ΨTot and  γ∗

1
 

or γi
1
 in contour plots as functions of �2 andh2 , we assumed 

h2
1
=
√
0.5 and identical genetic standard deviations in both 

sets ( �u1 = �u2) , which closely approximates the conditions 
encountered in many situations in plant breeding programs.

Software availability statement

The Mathematica programs developed for the numerical 
analyses of this study are available at https:// github. com/ 
TUMpl antbr eeding/ AEM/ Opt_ selec tion_ with_ multi ple_ sets 
and can be downloaded from there.

Results

Figure 1 examines for BLUPs the shift in the proportion of 
candidates from Π1 before ( �1) and after selection ( γ∗

1
 ). We 

present the ratio γ∗
1
∶ �1 as a function of �2and�2 under the 

a s s u m p t i o n s  m e n t i o n e d  a b o v e  (  �1

=0,�2
u1
= �2

u2
= 1.0, �1 = 0.50 ). Regardless of the magnitude 

of αT and�1 , the contour lines were straight lines, indicating 
that γ∗

1
 depends on a linear function of �2 and �2 with weights 

of these parameters determining their slope. For small values 
of �1 or αT , the ratio reduced substantially with an increasing 
sum �2 + �2 so that even for moderate values for one of these 
parameters, the ratio was smaller than 0.1, indicating that 
less than 10% of the initial proportion �1 was recovered in γ∗

1
 . 

For �1 = 0.90 in combination αT ≥ 0.10 , the ratio was less 
affected by increasing �2 and reduced only moderately with 
increasing �2 , yet the slope of the contour lines changed 
with �2.

As expected, under optimal thresholds ( t∗
1
, t∗
2
) for selection 

based on BLUEs, the contour plots for γ∗
1
∶ �1 were identical 

to those obtained for BLUPs, when replacing �1 by h1 =
√

h2
1
 

and �2 byh2 =
√

h2
2
 , respectively (results not shown). For 

c o m p a r i s o n ,  we  a l s o  a n a ly z e d  t h e  r a t i o 
γi
1
∶ �1 as a function of �2 and h2 to monitor the relative 

change in the proportion of candidates from Π1 before ( �1) 
and after selection ( γi

1
 ) based on BLUEs with identical 

thresholds ( ti
1
= ti

2
) for both sets (Supplementary Figure 1). 

Compared with γ∗
1
∶ �1 , the ratio γi

1
∶ �1 changed less with 

increasing �2 and h2 , particularly for large values of �1 or αT . 
The ratio depended mainly on the magnitude of h2 and less 
on the size of �2 . For �1 ≤ 0.50 and αT ≥ 0.10 , the ratio was 
smaller or larger than 1.0 if h2 falls below or exceeds h1 , 

https://github.com/TUMplantbreeding/AEM/Opt_selection_with_multiple_sets
https://github.com/TUMplantbreeding/AEM/Opt_selection_with_multiple_sets


 Theoretical and Applied Genetics (2024) 137:104104 Page 6 of 18

respectively, and increasing �2 had only a moderately reduc-
ing effect.

When performing mild selection ( αT = 0.25) with BLUEs, 
the size of ΨTot , reflecting the improvement in overall selec-
tion response achieved by using optimal ( t∗

1
, t∗
2
) instead of 

identical thresholds(ti
1
= ti

2
), was consistently smaller than 

10%, irrespective of �1 and the investigated range of �2 and 
h2 (Fig. 2). For αT = 0.10 , ΨTot was close to zero for �1=0.1 
but exceeded 10% for �1=0.50 and high values of h2 . Under 
stringent selection with αT = 0.01 and h2 ≥ 0.90 , ΨTot sur-
passed 20% for �1=0.50, regardless of �2, or if �1 = 0.50 
and h2 ≥ 0.5 . Setting �2 = 1.0 had only a minor effect on 
increasing ΨTot compared to increasing h2 from 

√
0.5 to 0.9.

Figure 3 shows ΨHyb, the increase in selection response 
for hybrids when selecting optimal ( �o

1
, �o

2
) versus equal 

( �1 = �2 = �e =
√
�H) proportions of lines from each parent 

population, as a function of �2 ∶ �1 , the ratio of the standard 
deviations of BLUPs for GCA effects of lines in Π1 and Π2 . 
ΨHyb showed an approximately quadratic decrease with 
increasing the ratio �2 ∶ �1 from 0.5 to 1.0 and minor dif-
ferences for different values of �H . For �2 ∶ �1 = 0.5,ΨHyb 
was approximately 6% for all values of �H . The ratio �o

1
 : 

�e displayed a quadratic decrease with increasing �2 ∶ �1 
with large differences depending on �H . For �2 ∶ �1 = 0.5 
and �H ≤ 0.01, �o

1
 : �e was smaller than 0.25, reflecting that 

selection of hybrids relied almost entirely on stringent GCA 

Fig. 1  Contour plots for the ratio�∗
1
∶ �1 , indicating the shift in the 

proportion of genotypes from Π1 before ( �1) and after ( �∗
1
) trunca-

tion selection based on BLUPs, when using optimal (= identical) 
thresholds (t∗

1
= t

∗
2
) in set Π1 andΠ2 . The graphs show �∗

1
∶ �1 as a 

function of the mean �2 and the prediction accuracy �2 of the selec-

tion criterion (SC) in Π2 for various values of �1 and �T , the propor-
tion of candidates selected from Π1 ∪ Π2 . Assumptions are �1=0, 
�2
u1
= �2

u2
= 1.0, �1 = 0.50, i.e., � =(0, �2, 0.5, �2,�1) . The white labels 

attached to the contour lines show the corresponding numerical values


Theoretical and Applied Genetics (2024) 137:104 Page 7 of 18 104

selection of lines in the parent population with higher vari-
ance of BLUPs and only mild selection in the other parent 
population.

Discussion

Examples of sets differing in population parameters

In all breeding categories described by Schnell (1982), 
plant breeders generally evaluate and select genotypes from 

multiple families in parallel as evident from publications on 
public and private breeding programs in maize and wheat 
(e.g., Auinger et al. 2021; Bonnett et al. 2022; Lian et al. 
2014). The parents of these mostly bi-parental families gen-
erally differ in their performance level and relationship, and 
therefore, the progenies differ with respect to relevant popu-
lation parameters. Nevertheless, these materials are routinely 
evaluated together in the same experiment(s) and genotypes 
promoted to the next stage of the program are often selected 
without giving much attention to their origin.

Fig. 2  Contour plots for ΨTot

(
αT, �, h

2
1
, h2

2

)
 , indicating the percentage 

increase of the selection response ΔGTot for selection based on 
BLUEs in Π1 ∪ Π2, when using optimal (t∗

1
, t∗
2
) versus identical 

(t
i

1
= t

i

2
) thresholds for truncation selection in set Π1 andΠ2 , respec-

tively. The graphs show ΨTot as a function of the mean �2 and h2, the 
square root of the heritability of the BLUEs in Π2 for various values 

of �1 and  �T , the proportion of candidates selected from Π1 ∪ Π2 . 
Assumptions are �1 = 0, �2

u1
= �2

u2
= 1.0, h2

1
= 0.50, i.e., � = (0, 

�2,
√
2,

1√
h
2
2

,�1) . The white labels attached to the contour lines show 

the corresponding numerical values


 Theoretical and Applied Genetics (2024) 137:104104 Page 8 of 18

A comparable situation exists in introgression breed-
ing programs when multiple populations are developed by 
crossing elite germplasm with various donors (e.g., Bar-
bosa et al. 2021). These materials generally differ in their 
performance level and genetic variance due to disparate 
adaptation of the donors to the target environment(s) and 
varying proportions of donor germplasm in the pedigree. 
In pre-breeding programs too, the differences among popu-
lations can be extremely large as reported for landraces 
of maize (Böhm et al. 2017; Hölker et al. 2019; Mayer 
et al. 2017). If all populations are evaluated in a common 
experiment, breeders are inclined to apply the same thresh-
old for identifying superior candidates used for further 
breeding.

Even when dealing with a single population, so that the mean 
and genetic variance are identical, sets of genotypes often differ 
with regard to the prediction accuracy of the SC for the TGV 
of candidates. This can be attributable to unbalanced data from 
multi-environment trials, where some sets are evaluated in fewer 
environments or replications than others. For instance, top per-
formers remain in the testing pipeline for several years, while 
new entries are added to the system (Piepho et al. 2008). Moreo-
ver, some genotypes might be tested less intensively owing to 
problems in seed multiplication, as occurs in the production of 
doubled-haploid lines (Chaikam et al. 2019) or in speed breed-
ing programs (Watson et al. 2018). Further, when complex traits 
are monitored using sensor-based techniques (NIRS, optical 
sensors, etc.) or “omics” data (genomic, phenomic, etc.), the 
prediction accuracy tends to be notably higher in the calibration 
set compared to the prediction set (Melchinger and Frisch 2023) 
and in sets combining different “omics” features (Schrag et al. 
2018; Westhues et al. 2019). Thus, there are numerous scenarios 
where sets of germplasm in a breeding program differ in their 

population parameters and breeders should be prepared to deal 
adequately with these situations and be aware of the implica-
tions for selection.

Contrasting BLUEs and BLUPs as selection criteria

Until two decades ago, selection decisions in plant breed-
ing relied exclusively on BLUEs of the candidates, a prac-
tice that still endures in many smaller breeding programs 
today. Two major reasons contribute to this conservative 
attitude. Firstly, for traits with high heritability on an entry-
mean basis, the ranking of candidates based on BLUEs and 
BLUPs is mostly similar. Secondly, calculation of BLUEs 
is straightforward and does not require information on the 
relationships among candidates or estimates of genetic vari-
ance components, which are challenging to obtain due to the 
small size of sets and rapid change over selection cycles.

Building upon the pioneering research of Henderson 
(1975) and inspired by the tremendous progress in animal 
breeding subsequent to the adoption of BLUPs, Bernardo 
(1994) spearheaded the implementation of BLUPs into 
plant breeding. With balanced data and when candidates 
are unrelated or possess identical co-ancestries so that their 
TGVs are predicted with equal accuracy, the ranking of 
candidates based on BLUEs and BLUPs is identical (Ken-
nedy and Sorenson 1988). Otherwise, BLUPs offer a notable 
advantage by capitalizing on information from relatives and/
or accommodating an efficient analysis of unbalanced data 
(Bernardo 2002; Piepho et al. 2008).

Another major advantage of BLUPs over BLUEs is their 
ability to allow direct comparisons across different breed-
ing sets, regardless of their origin. As outlined in Eq. 6, 
applying the same selection threshold to the BLUPs of all 

Fig. 3  A Percentage increase �Hyb

(
�H,�

)
 of the selection response in 

the hybrid population Π1 × Π2 and B ratio of the optimal proportion 
of selected candidates ( �0

1
) from Π1 versus an equal (�e =

√
�H) pro-

portion of lines selected from each parent population based on GCA 

predicted by BLUPs for � =
(
�1, �2, 1, 1

)
 . The graphs show �Hyb and 

as function of �2 ∶ �1 , the ratio of standard deviations of BLUPs for 
GCA of lines in Π2 and Π1 , respectively, for different values of �H , the 
proportion of hybrids selected from Π1 × Π2


Theoretical and Applied Genetics (2024) 137:104 Page 9 of 18 104

candidates is optimal, whereas for BLUEs distinct thresh-
olds must generally be found to maximize the selection 
response of the entire program. Following Cochran (1951), 
our theoretical results were derived assuming that the SC 
and TGVs are independently and identically distributed 
within each set because otherwise the already complex 

algebra would become even more unwieldy. This assumes 
an idealized situation, which is seldom met in practice 
as data are generally unbalanced and candidates com-
monly differ in their relationships. However, considering 
that the regression function of TGVs on BLUPs remains 
an identity matrix even under less stringent assumptions 

Fig. 4  Individual and joint probability density functions (pdf) of the 
true genetic values (TGV ~ N(0, 1) ) and selection criterion (SC ~ 
N
(
0, �2

)
 for sets Π1 and Π2 with equal proportions ( �1 = �2 = 0.5). 

� SC are BLUEs with

√
h
2
1
= 0.6 and 

√
h
2
2
= 0.9 in  Π1 and Π2, 

respectively. B SC are BLUPs with �1 = 0.6 and  �2 = 0.9 in  Π1 and 
Π2, respectively. In both cases, truncation selection with identical 
thresholds is practiced in Π1∪ Π2 to achieve �T = 0.1 . SD refers to the 
selection differential


 Theoretical and Applied Genetics (2024) 137:104104 Page 10 of 18

(“Appendix 4”), we conjecture that our results for BLUPs 
hold approximately true across a broad spectrum of sce-
narios, but this warrants further research.

The difference between BLUEs and BLUPs is illus-
trated by two sets Π1 and Π2 with equal proportion 
( �1 = 0.5 ) of unrelated candidates sampled from the same 
population and selection of �T = 0.10 candidates across 
Π1 ∪ Π2 (Fig. 4). Thus, the two sets share identical means 
(  �1 = �2 = 0  )  and genet ic  s tandard devia t ions 
( �u1 = �u2 = 1 ). Regarding the prediction accuracy of the 
SC, we assume 

√
h2
1
= �1 = 0.6 for Π1 and 

√
h2
2
= �2 = 0.9 

for Π2 , i.e., these values differ between the two sets but are 
identical for BLUEs and BLUPs within each set.

When using BLUEs, the standard deviation of the SC is 
l a r g e r  i n  Π1  c o m p a r e d  t o  Π2 
( �1 =

�u1
h1

= 1.67 vs. �2 =
�u2
h2

= 1.11 ) due to the lower herit-
ability. Utilizing identical thresholds ( ti

1
= ti

2
= 1.77 ) for 

�T = 0.10 , a larger proportion of candidates is selected in Π1 
than in Π2 ( �1 = 0.14 vs. �2 = 0.06 ), leading to lower selec-
tion intensity in Π1 ( i�1 = 1.57 vs. i�2 = 2.02 ). While the 
s e l e c t i o n  d i f f e r e n t i a l s  a r e  s i m i l a r 
( SD1 = 2.62 vs. SD2 = 2.24 ), the selection response almost 
doubles in Π2 compared to Π1 ( ΔG1 = 0.94 vs. ΔG2 = 1.82 ) 
owing to the higher heritability. Since the proportion of can-
didates selected from Π1 is much larger than it would be with 
optimal thresholds ( � i

1
= 0.72 vs. �∗

1
= 0.28 ), this explains 

w hy  fo r  B L U E s  t h e  s e l e c t i o n  r e s p o n s e 
ΔGTot(1.77, 1.77, �, 0.36, 0.81) = 1.19 is significantly smaller 
t h a n  t h e  m a x i m u m  s e l e c t i o n  r e s p o n s e 
ΔGTot(2.65, 1.18, �, 0.36, 0.81) = 1.36 achieved with optimal 
thresholds 

(
t∗
1
= 2.65, t∗

2
= 1.18

)
 , resulting in �Tot = 14.5%. 

For very stringent selection with �T = 0.01, we get 
� i
1
= 0.95. vs. �∗

1
= 0.05 , leading to �Tot = 42.3%.

When using BLUPs as SC, candidates of Π1 
exhibit a smaller standard deviation than those of Π2 
( �1 = �1�u1 = 0.6 vs. �2 = �2�u2 = 0.9 ) due to increased 
shrinkage. Consequently, applying identical thresholds 
( t∗
1
= t∗

2
= 0.96 ) to both sets for achieving �T = 0.10 leads to 

a smaller proportion of candidates (�1 = 0.06 vs. �2 = 0.14 ) 
and a higher selection intensity (i�1 = 2.02 vs. i�2 = 1.57 ) 
for Π1 compared to Π2 . Given that the regression for TGVs 
on BLUPs is equal to 1.0 (Henderson 1975), we obtain 
ΔG1 = 1.21 and ΔG2 = 1.42 . Referring to Eqs. 2 and 13, 
we get �∗

1
= 0.28 and ΔGTot(0.96, 0.96, �, 1, 1) = 1.36 . While 

this example was chosen for simplicity, it underscores the 
fundamental disparities between BLUEs and BLUPs for 
selection in scenarios involving multiple sets.

Properties of BLUPs for selection

BLUPs possess several optimality properties for prediction 
of random effects in mixed linear models (Fernando and 
Gianola 1986; Henderson 1990). They have minimum pre-
diction error variance and maximize the correlation to the 
TGVs in the class of linear unbiased predictors. Further-
more, when random effects adhere to a normal distribution 
and fixed effects in the mixed model are known, BLUPs 
have smallest mean-squared error among all possible predic-
tors. Concerning truncation selection, we provided a proof in 
“Appendix 2” that when dealing with two sets characterized 
by distinct population parameters (e.g., means, variances and 
prediction accuracies of TGVs), utilizing a uniform thresh-
old for the BLUPs across all candidates maximizes the selec-
tion response.

We derived this property of BLUPs through a Lagrange 
multiplier approach, which requires quite restrictive assump-
tions on the random effects u in the different sets. It is closely 
related to a more general selection principle (Fernando and 
Gianola 1986; Goffinet 1983). Accordingly, if n candidates 
are available and k < n of them are to be chosen, then select-
ing the k candidates with highest conditional mean for an 
unobservable random variable u maximizes the expected 
value of the mean of u for the selected candidates. This 
result holds true independent on the joint distribution of the 
unobservable random variable u and the data. Under nor-
mality, the BLUP of u can be thought of as its conditional 
mean. Thus, even when the candidates are from different 
sets, selecting the k candidates with the highest values for 
BLUPs (û) would maximize the response to selection and no 
further corrections are needed.

In a strict sense, selecting a fixed number k or constant 
proportion � = k∕n of candidates from a finite population of 
size n differs from truncation selection. In truncation selec-
tion, the threshold is set so that the expected proportion of 
candidates is equal to � in a population of infinite size. When 
applying this fixed threshold to a sample of size n , the num-
ber of selected candidates may deviate from k . However, 
as the sample size increases, selecting a fixed number or 
proportion of candidates becomes equivalent to truncation 
selection. Therefore, the results derived for truncation selec-
tion in this study closely approximate those for selecting a 
constant proportion of candidates.

Our approach for proving the optimality property of 
BLUPs under truncation selection allows calculating the 
optimal proportion �∗

1
 and �∗

2
 of candidates selected from 

set Π1 andΠ2 , respectively, given reliable estimates of the 
population parameters are available. This information 
is important for optimizing the allocation of resources in 
genomic selection based on BLUPs. By knowing �∗

1
 and �∗

2
 

in advance, we can calculate the selection response across 
both the training and prediction set. Thus, we can find the 


Theoretical and Applied Genetics (2024) 137:104 Page 11 of 18 104

ideal balance between (1) the expenditures allocated to the 
training set, which determines mainly the prediction accura-
cies of both the training and prediction set, and (2) the size 
of both sets, which determines �T . A thorough examination 
of this complex problem is beyond the scope of this study 
and warrants further research.

Using the same threshold for BLUPs does not neces-
sarily mean that all candidates share an equal likelihood 
of being selected, even if they possess the same TGV as 
highlighted in the literature (Woolliams et al. 2015). This 
can be exemplified by Fig. 4, where for �T = 0.10 the propor-
tion of candidates from set Π1 would reduce from �1 = 0.50 
before selection to �∗

1
= � i

1
= 0.28 after selection owing to 

the lower prediction accuracy for Π1 and increased shrink-
age of BLUPs.

Composition of the selected fraction

Generalizing Cochran’s formula for selection response to 
the case of multiple sets allowed us to examine the pro-
portions ( �1, �2) of selected candidates originating from Π1 
andΠ2 . This is of interest for two reasons. First the selection 
response for the combined set depends on a weighted sum-
mation of the selection response in each set (Eq. 3), with 
weights corresponding to the post-selection fractions �1 
and �2 . Second, the makeup of the selected fraction is criti-
cal for further breeding progress, given that these candidates 
are used either directly for product development and/or for 
generating the base materials of the next breeding cycle. In 
extreme cases, ΔGTot for BLUEs can even be negative. For 
instance, if only mild selection ( �1 = 0.45) is applied to the 
inferior, smaller population Π1(�1=0.2, �1 = 0 , �1 = 1, h2

1
 = 

0.36) but stringent selection ( �2 = 0.0125) is applied to Π2

(�2 = 2.0 , �2 = 2 and h2
2
 = 0.81) so that �1 = 0.90 is much 

larger than �1 , the outcome would beΔGTot = −0.70.
Here, we focus our discussion on the composition of the 

selected fraction obtained through the use of BLUPs with 
a uniform threshold for all candidates. As indicated by the 
graphs in Fig. 1, the change in the composition of the can-
didates before and after selection, expressed by the ratio 
�∗
1
 : �1 , can be striking. For instance, when �1 = 0.10 and/or 

�T = 0.01, the proportion retained from the inferior set Π1 
dwindles to less than 10% of its original proportion, if �2 
surpasses �1 by about one genetic standard deviation under 
otherwise identical conditions. Consequently, if materials 
from introgression programs are evaluated together with 
elite germplasm and the same threshold is applied to the 
BLUPs of both groups, hardly any novel germplasm will be 
selected due to its low performance level. Thus, it would be 
prudent to apply different thresholds for both groups to have 
a realistic chance that some of the promising new genotypes 
are retained for further breeding.

Likewise, the ratio�∗
1
 : �1 falls below 0.20, if two sets share 

equal size and population parameters, yet �2 ≥ 0.68 while 
�1 = 0.50 . Differences of this magnitude have been observed 
in genomic prediction of maize hybrids, in which case the 
prediction accuracy significantly decreased from H2 hybrids, 
where both parents are used as parents of a hybrid in the 
training set, to H1 and H0 hybrids, where only one or none 
of the parent lines, respectively, contribute to a hybrid in the 
training set (Seye et al. 2020; Technow et al. 2014; West-
hues et al. 2017). While it seems rewarding to have a much 
larger number of H0 hybrids than H1 and H2 hybrids due to 
their lower costs (involving only production and genotyp-
ing of parent lines), the contribution of H0 hybrids to the 
overall selection response is generally overrated because 
their selected proportion is much smaller than for H2 and 
H1 hybrids owing to the lower prediction accuracy. Con-
sequently, H0 hybrids contribute significantly less to the 
selection response than expected based on their proportion 
in the entire set of predictable hybrids. This aspect is crucial 
when optimizing the distribution of resources allocated to 
the training and prediction sets (Riedelsheimer and Melch-
inger 2013).

There are many further examples, where sets differ in 
their prediction accuracy because they differ in the num-
ber of close relatives in the training set. For this reason, 
genotypes in the training set have generally a significantly 
higher prediction accuracy than those in the prediction set, 
leading to a notable underrepresentation of the latter in the 
selected set. Likewise, in recycling breeding breeders typi-
cally generate more and larger families from crosses of elite 
parents. If the training set is sampled proportional to the 
size of these families, it follows that genotypes descending 
from the top parents have higher prediction accuracy due 
to more and closer relatives in the training set than geno-
types descending from less prominent parents. Thus, on top 
of the expected high TGVs of these progenies, the smaller 
shrinkage of their BLUPs further increases the likelihood 
that they are selected. However, this carries a high risk of 
selecting closely related genotypes descending from a small 
number of top ancestors, thereby diminishing the effective 
population size and long-term progress in genomic selection, 
particularly when applying rigorous selection pressure ena-
bled by the low costs for genotyping with modern methods 
(Rasheed et al. 2017). While our focus has been primarily 
on diverse prediction accuracies, our conclusions can be 
extended to scenarios where sets differ in genetic variances.

Optimal selection of parent lines in hybrid breeding

In hybrid breeding, breeders typically work with a compara-
ble number of lines from each parent population and select, 
based on GCA predicted by BLUPs, a proportional number 
of candidates from both groups for the final testing phase in 


 Theoretical and Applied Genetics (2024) 137:104104 Page 12 of 18

product development (Melchinger and Frisch 2023). Fig-
ure 3 shows that this approach is optimal when the parent 
populations exhibit similar variances for the SC, but this 
is not always the case in practice. In European maize for 
example, GCA variance for grain yield was approximately 
twice as large for dent lines compared to flint lines (Schrag 
et al. 2006). Similarly in hybrid rye, Wilde et al. (2003) 
found that GCA variance for grain yield among female lines 
from the Petkus pool was almost four times greater than 
observed among male lines from the Carsten pool. Addi-
tionally, the accuracy of predicted GCA effects can differ 
between the parent populations due to differences in the size 
and intensity of phenotyping of the training set and the use 
of different types of testers. Furthermore, in species like 
rye, where the implementation of CMS for testcross seed 
production differs significantly between the seed and pollen 
parent pools, the pedigree relationship between candidates 
in the prediction and training set can diverge (Wilde and 
Miedaner 2021).

Under these scenarios, a notable enhancement �Hyb in 
selection response for predicted hybrids, compared to select-
ing equal proportions in each population, can be achieved by 
opting for more stringent selection within the parent popu-
lation exhibiting the larger GCA variance. The magnitude 
of �Hyb depends strongly on the ratio of GCA variances in 
the two parent populations but showed similar curves inde-
pendent of the selected proportions �H (Fig. 3). Under mild 
selection ( �H = 0.25), the optimal α-values for the two parent 
populations hardly differ from each other, but for stringent 
selection ( �H = 0.0001), a much more stringent selection 
must be practiced in the parent population with larger GCA 
than smaller GCA variance, as reflected by the low ratio �0

1

:�e , where �e =
√
�H . As an alternative to selecting parent 

lines based on their predicted GCA for producing a complete 
factorial of hybrids, one could directly select the most prom-
ising hybrids based on the sum of the GCA of their parents. 
This would result in selecting a partial factorial having the 
form of a triangle, with the top parents being involved in 
more crosses than parents with lower rank and automatically 
takes care of differences in the GCA variance of BLUPs for 
each parent population. A comparison of these two selection 
schemes would be highly interesting for hybrid breeding but 
is beyond the scope of this study.

Conclusions

When practicing truncation selection with candidates from 
multiple sets, new aspects must be taken into consideration 
as compared to selection in a single homogeneous popula-
tion. This is because selection progress in the entire breeding 
program depends not only on the selection response in each set 
but also on the composition of the selected fraction. A major 

question is how to choose the thresholds for candidates from 
the various sets for maximizing the selection response of the 
entire breeding program. In addition to the numerous advan-
tages of BLUPs compared to BLUEs, they have the highly 
desirable property that a uniform threshold can be applied 
to all candidates for maximizing the selection response and 
no further adjustment for differences in the reliability of the 
predictors is necessary. This applies even if the sets differ in 
the population parameters and/or if BLUPs of different can-
didates are calculated from different types or combinations 
of "omics" data and simplifies selection decisions. However, 
calculation of BLUPs requires reliable estimates of the genetic 
variance, which is a challenge with the small sample sizes 
of families used in plant breeding, but this problem has been 
mitigated with the use of Bayesian methods (Sorenson and 
Gianola 2004).

Since variation in the prediction accuracy can have a strong 
impact on the outcome of the selected fraction and strongly 
reduces the effective population size under the stringent selec-
tion, we recommend to accompany genomic selection based on 
BLUPs with monitoring the genetic diversity of the selected 
candidates. Ideally, genomic selection could be combined with 
optimum contribution selection (Daetwyler et al. 2007; Gaynor 
et al. 2021; Woolliams et al. 2015), where the relationship of 
candidates is determined from genomic data.

In genomic selection of hybrids based on predicted values 
of GCA of their parents, we suggest to select different propor-
tions in the two parent populations, if these differ substantially 
in their population parameters such as the GCA variances and/
or prediction accuracy of GCA effects.

Appendix 1: Response to truncation 
selection in two sets and composition 
of the selected set

In our notation, we use �(x) and Φ(x) to denote the proba-
bility density function and cumulative distribution function 
of the standard normal distribution N(0, 1) , respectively; 
�(x) = 1 − Φ(x) and i�(x) denote the selected proportion and 
selection intensity, respectively, when applying threshold x to 
a standard normal distribution N(0, 1).

Our assumptions for the mathematical derivations are:

1. Z is a Bernoulli distributed variable that indicates the 
origin of a candidate C from set Π1 or Π2 , where Z = 1 
with probability �1 =

|Π1|
|Π1∪Π2| for C ∈ Π1 and Z = 2 with 

probability �2 = 1 − �1 for C ∈ Π2.
2. X is the random variable for the SC with a condi-

tional distribution X|Z=1 ∼ N
(
�1, �

2
1

)
 for C ∈ Π1 and 

X|Z=2 ∼ N
(
�2, �

2
2

)
 for candidates C ∈ Π2.

3. Candidates C from set Π1 and Π2 are selected based 
on their SC surpassing the respective thresholds t1 and 


Theoretical and Applied Genetics (2024) 137:104 Page 13 of 18 104

t2 , yielding the sets Γ1 and Γ2 of selected candidates. 
Choosing thresholds t1 and t2 is equivalent to selecting 
proportions �1 and �2 of top candidates from Π1 and Π2, 
respectively, where

Our subsequent derivations are based on thresholds as deal-
ing with proportions would further complicate the already 
complex algebra. Applying the theorem of total probability 
and defining  � =

(
�1,�2, �1, �2,�1,�2

)
 , we obtain for the 

total proportion of candidates selected fromΠ1 ∪ Π2:

Using Bayes’ formula, the proportion of candidates from Π1 
in the entire set Γ1 ∪ Γ2 of selected candidates is

The expectation of the SC for the candidates in Γ1 and Γ2 
selected from Π1 and Π2 , respectively, is

where �i
�
(

t−�

�

) is the selection differential under truncation 

selection with threshold t in a normal distribution N
(
�, �2

)
 , 

and i�(x) is the selection intensity (Falconer and Mackay 
1996, p. 189).

Thus, we obtain the expectation of the selected candidates 
in Γ1 ∪ Γ2 as

(16)

�1 = P
[

X > t1||C ∈ Π1
]

= P
[

X|Z=1 > t1
]

=
[

1 − Φ
(

t1 − �1
�1

)]

= �
(

t1 − �1
�1

)

�2 = P
[

X > t2||C ∈ Π2
]

= P
[

X|Z=2 > t2
]

=
[

1 − Φ
(

t2 − �2
�2

)]

= �
(

t2 − �2
�2

)

.

(17)

�Tot
(

t1, t2, �
)

= P
[

X > t1||C ∈ Π1
]

⋅ P
[

C ∈ Π1
]

+ P
[

X > t2||C ∈ Π2
]

⋅ P
[

C ∈ Π2
]

= P
[

X|Z=1 > t1
]

P[Z = 1] + P
[

X|Z=2 > t2
]

P[Z = 2]

= �
(

t1 − �1
�1

)

�1 + �
(

t2 − �2
�2

)

�2

(18)

�1
(

t1, t2, �
)

=
P
[

X|Z=1 > t1
]

P[Z = 1]

P
[[

X|Z=1 > t1
]

∨
[

X|Z=2 > t2
]]

=
�
(

t1−�1
�û1

)

�1

�Tot
(

t1, t2, �
) =

|

|

Γ1||
|

|

Γ1 ∪ Γ2||
and

�2
(

t1, t2, �
)

= 1 − �1
(

t1, t2, �
)

.

(19)
E
[

X|Z=1 > t1
]

= �1i�
(

t1−�1
�1

) + �1 and

E
[

X|Z=2 > t2
]

= �2i�
(

t2−�2
�2

) + �2,

Since the mean of unselected candidates in Π1 ∪ Π2 is 
obtained as 

 we get for the change in the mean of the SC as a result of 
truncation selection (= selection differential)

Assuming the regression coefficient of the TGV on the 
SC is b1 in Π1 and b2 in Π2 and applying the breeders’ equa-
tion for each set, we get for the total selection response 
in Π1 ∪ Π2 under truncation selection with thresholds t1 
and t2:

w h e r e  ΔG1

((
t1 − �1

)
, �1, b1

)
= b1�1i�

(
t1−�1
�1

)  a n d 

ΔG2

((
t2 − �2

)
, �2, b2

)
= b2�2i�

(
t2−�2
�2

) refer to the selection 

response realized in set Π1 and Π2 , respectively.

Appendix 2: Maximizing selection response 
by optimal choice of selection thresholds

To determine the maximum of the selection response 
GTot

(
t1, t2, �, b1, b2

)
 as a function of the thresholds t1 and t2 , 

we use a Lagrange multiplier approach. We start with 
some basic properties of the normal distribution N(0, 1) 
with pdf �(x) and cdf Φ(x) .  Let  i�(x) =

�(x)

�(x)
 and 

�(x) = ∫ ∞

x
�(z)dz = 1 − Φ(x) be the selection intensity and 

selected propor tion, respectively, then we have 

(20)

E
[
X|
[
Z = 1 ∧ X|

Z=1 > t1

]
∨
[
Z = 2 ∧ X|

Z=2 > t2

]]

= E
[
X|

Z=1 > t1

]
P
[
X|

Z=1 > t1
||
[
X|

Z=1 > t1

]
∨
[
X|

Z=2 > t2

]]

+ E
[
X|

Z=2 > t2

]
P
[
X|

Z=2 > t2
||
[
X|

Z=1 > t1

]
∨
[
X|

Z=2 > t2

]]

=

[
𝜎1i𝛼

(
t1−𝜇1
𝜎1

) + 𝜇1

]
𝛾1
(
t1, t2, �

)

+

[
𝜎2i𝛼

(
t2−𝜇2
𝜎2

) + 𝜇2

]
𝛾2
(
t1, t2, �

)

E[X|[Z = 1] ∨ [Z = 1]] = E[X|Z = 1]P[Z = 1]
+ E[X|Z = 2]P[Z = 2] = �1�1 + �2�2

(21)

SDTot

(
t1, t2, �

)
= �1i�

(
t1−�1
�1

)�1
(
t1, t2, �

)
+�2i�

(
t2−�2
�2

)�2
(
t1, t2, �

)

+ �1

[
�1
(
t1, t2, �

)
− �1

]
+ �2

[
�2
(
t1, t2, �

)
− �2

]

(22)

ΔGTot
(

t1, t2, �, b1, b2
)

= ΔG1
((

t1 − �1
)

, �1, b1
)

�1
(

t1, t2, �
)

+ ΔG2
((

t2 − �2
)

, �2, b2
)

�2
(

t1, t2, �
)

+ �1
[

�1
(

t1, t2, �
)

− �1
]

+ �2
[

�2
(

t1, t2, �
)

− �2
]


 Theoretical and Applied Genetics (2024) 137:104104 Page 14 of 18

��(x)

�x
= −x�(x) and ��(x)

�x
= −�(x) . Hence, setting x = t−�

�
 , 

we obtain.

Defining for given values of �, b1, b2 and �T ∈ (0, 1):

and

we get the Lagrangian function

Thus, for given values of �, b1, b2 and � , we obtain a nec-
essary condition for the maximum of ΔGTot

(
t1, t2, �, b1, b2

)
= 

1

�T
f
(
t1, t2

)
 under the side condition g

(
t1, t2

)
 =0 by analyzing 

the gradient ∇L
(
t1, t2, �

)
.

We have

and

Thus,

(23)
��

(
t−�

�

)

�t
= −

( t − �

�

)
�
( t − �

�

)
1

�
.

(24)
��

(
t−�

�

)

�t
= −�

( t − �

�

)
1

�
.

�1
(

t1
)

= �
(

t1 − �1
�1

)

�1, �2
(

t2
)

= �
(

t2 − �2
�2

)

�2,

g
(

t1, t2
)

= �1
(

t1
)

+ �2
(

t2
)

− �T

f
(

t1, t2
)

= ΔG1
((

t1 − �1
)

, �1, b1
)

�1
(

t1
)

+ ΔG2
((

t2 − �2
)

, �2, b2
)

�2
(

t2
)

+ �1
(

�1
(

t1
)

− �1�T
)

+ �2
(

�2
(

t2
)

− �2�T
)

,

(25)L
(
t1, t2, �

)
= f

(
t1, t2

)
+ �g(t1, t2)

ΔG1
((

t1 − �1
)

, �1, b1
)

�1
(

t1
)

= b1�1
�
(

t1−�1
�1

)

�
(

t1−�1
�1

) �
(

t1 − �1
�1

)

�1

= b1�1�1�
(

t1 − �1
�1

)

ΔG2

((
t2 − �2

)
, �2, b2

)
�2
(
t2
)
= b2�2�2�

(
t2 − �2

�2

)
.

�(ΔG1
((

t1 − �1
)

, �1, b1
)

�1
(

t1
)

�t1

= −b1�1�1

(

t1 − �1
�1

)

�
(

t1 − �1
�1

)

1
�1

= −b1�1
(

t1 − �1
)

�
(

t1 − �1
�1

)

1
�1

Likewise,

and

Thus, we get

�ΔG
2

((
t
2
− �

2

)
, �

2
, b

2

)
�
2

(
t
2

)

�t
1

= 0,

�
(
�
1

(
�
1

(
t
1

)
− �

1
�
Tot

))

�t
1

= −�
1
�
1
�

(
t
1
− �

1

�
1

)
1

�
1

,

�
(
�
2

(
�
2

(
t
2

)
− �

2
�
Tot

))

�t
1

= 0,

��g
(
t1, t2

)

�t1
= −��1�

(
t1 − �1

�1

)
1

�1
.

�(ΔG1
((

t1 − �1
)

, �1, b1
)

�1
(

t1
)

�t2
= 0,

�ΔG2
((

t2 − �2
)

, �2, b2
)

�2
(

t2
)

�t2
= −b2�2

(

t2 − �2
)

�
(

t2 − �2
�2

)

1
�2

,

�
(
�1

(
�1
(
t1
)
− �1�T

))

�t2
= 0,

�
(
�2

(
�2
(
t2
)
− �2�T

))

�t2
= −�2�2�

(
t2 − �2

�2

)
1

�2
,

��g
(
t1, t2

)

�t2
= −��2�

(
t2 − �2

�2

)
1

�2
.

�(ΔG1
((

t1 − �1
)

, �1, b1
)

�1
(

t1
)

��
= 0,

�ΔG2
((

t2 − �2
)

, �2, b2
)

�2
(

t2
)

��
= 0,

�(�1
(

�1
(

t1
)

− �T
)

)
��

= 0,

�(�2
(

�1
(

t1
)

− �1�T
)

)
��

= 0

��g
(
t1, t2

)

��
= g

(
t1, t2

)

(26)

�L
(

t1, t2, �
)

�t1
= −b1�1

(

t1 − �1
)

�
(

t1 − �1
�1

)

1
�1

− �1�1�
(

t1 − �1
�1

)

1
�1

− ��1�
(

t1 − �1
�1

)

1
�1

= −
�1
�1

�
(

t1 − �1
�1

)

[
(

b1
(

t1 − �1
)

+ �1 + �
]

(27)

�L
(
t1, t2, �

)

�t2
= −

�2
�2

�

(
t2 − �2

�2

)
[(b2

(
t2 − �2

)
+ �2 + �]


Theoretical and Applied Genetics (2024) 137:104 Page 15 of 18 104

Set t ing ∇L
(
t1, t2, �

)
= (0, 0, 0) and us ing that 

𝜋1
𝜎1
𝜑
(

t1−𝜇1

𝜎1

)
> 0 and 𝜋2

𝜎2
𝜑
(

t2−𝜇2

𝜎2

)
> 0 , we obtain from Eqs. 26 

and 27 the necessary conditions: - � = b1
(
t1 − �1

)
+ �1 and 

-  � = b2
(
t2 − �2

)
+ �2  o r  e q u i v a l e n t l y 

b1
(
t1 − �1

)
+ �1 = b2

(
t2 − �2

)
+ �2 . Thus, the solutions t∗

1
, t∗
2
 

must fulfill the following conditions:

and

Defining t∗∗
1

= t∗
1
− �1 and t∗∗

2
= t∗

2
− �1, Eqs. 29 and 30 

a r e  e q u i v a l e n t  t o  t∗∗
1

=
b2t

∗∗
2

b1
 

andΦ
(

t∗∗
1

�1

)
�1 + Φ

(
t∗∗
2

�2

)
�2 = 1 − �T ,  which would be 

obtained if we assume �1 = 0 . The improvement in 
ΔGTot

(
t∗
1
, t∗
2
, �, b1, b2

)
 relative to ΔGTot

(
ti
1
, ti
2
, �, b1, b2

)
 , 

where ti
1
= ti

2
 such that �Tot

(
ti
1
, ti
2
, �
)
= �T or equivalently 

Φ

(
ti
1
−�2

�1

)
�1 + Φ

(
ti
2
−�2

�2

)
�2 = 1 − �T, can be expressed as 

the ratio ΨTot

(
�T, �, b1, b2

)
 defined in Eq. 10.

Appendix 3: Maximizing the total selection 
response for factorials in hybrid breeding

Since the SC is expected to provide unbiased estimates for 
the GCA effects in each parent population, we have 
�1 = �2 = 0 . Additionally, we assume that b1 and b2 are the 
regression coefficients of the regression of GCA effects on 
the SC in Π1 and Π2 , respectively, and define � =( 
�1, �2, b1, b2) . We select proportions �1 =

|Γ1|
|Π1| and �2 =

|Γ2|
|Π2| 

from Π1 and Π2 , respectively, by applying corresponding 
thresholds t1 = �1Φ

−1
(
1 − �1

)
 and t2 = �2Φ

−1
(
1 − �2

)
 . If 

the selected lines are mated in the form of a complete 
factorial design to produce the hybrids tested in the next 

(28)
�L

(
t1, t2, �

)

��
= g

(
t1, t2

)

(29)
t1 =

b2
(

t2 − �2
)

+ �2 − �1
b1

+ �1 or equivalently

t2 =
b1
(

t1 − �1
)

+ �1 − �2
b2

+ �2

(30)Φ

(
t1 − �1

�1

)
�1 + Φ

(
t2 − �2

�2

)
�2 = 1 − �T

(31)
For the special case of BLUPs, where b1 = b2 = 1.0, we have t∗1 = t∗2 .

(32)

For BLUEs and �1 = �2, we get t∗
1
− �1 =

b2
(
t∗
2
− �1

)

b1
.

step of the breeding program, the proportion of selected 
hybrids from the total set of possible hybrids is

The expected selection response in the hybrids among 
lines from Π1 and Π2 , which were selected for their GCA in 
cross-combinations with genotypes from the other popula-
tion, can be obtained using Eq. 6 as follows

with

and

Defining for given values of � and �H ∈ (0, 1):

and

We get the Lagrangian function

Thus, for given values of � and �H , we obtain a necessary 
condition for the maximum of ΔGHyb

(
t1, t2,�

)
= 1

�H
f
(
t1, t2

)
 

under the side condition g
(
t1, t2

)
 =0 by analyzing the gradi-

ent ∇L
(
t1, t2, �

)
 . We have

(33)�Hyb
(
t1, t2,�

)
= �

(
t1

�1

)
× �

(
t2

�2

)
.

(34)ΔGHyb

(
t1, t2,�

)
= ΔG1

(
t1, �1, b1

)
+ ΔG2

(
t2, �2, b2

)
,

(35)

ΔG1

(
t1, �1, b1

)
= b1�1

�
(

t1

�1

)

�
(

t1

�1

) =
1

�Hyb
(
t1, t2,�

)b1�1�
(
t1

�1

)
�

(
t2

�2

)

(36)ΔG2

(
t2, �2, b2

)
=

1

�Hyb
(
t1, t2,�

)b2�2�
(
t2

�2

)
�

(
t1

�1

)

�1
(
t1
)
= �

(
t1

�1

)
, �2

(
t2
)
= �

(
t2

�2

)
, g

(
t1, t2

)
= �1

(
t1
)
�2
(
t2
)
− �H

f
(
t1, t2

)
= b1�1�

(
t1

�1

)
�

(
t2

�2

)
+ b2�2�

(
t2

�2

)
�

(
t1

�1

)
,

(37)L
(
t1, t2, �

)
= f

(
t1, t2

)
+ �g

(
t1, t2

)

(38)

�L
(

t1, t2�
)

�t1
= −b1�1

t1
�1

�
(

t1
�1

)

1
�1

�
(

t2
�2

)

+ b2�2�
(

t2
�2

)

×
(

−�
(

t1
�1

)

1
�1

)

+ �
(

−�
(

t1
�1

)

1
�1

�
(

t2
�2

))

=
−�

(

t1
�1

)

�
(

t2
�2

)

�1

[

b1t1 + b2�2i�
( t2
�2

) + �
]

(39)�L
(
t1, t2�

)

�t2
=

−�
(

t2

�2

)
�
(

t1

�1

)

�2

[
b1�1i�

(
t1

�1

) + b2t2 + �

]


 Theoretical and Applied Genetics (2024) 137:104104 Page 16 of 18

Thus, from 
(

∇L(t1,t2,�)
�t1

,
∇L(t1,t2,�)

�t2
,
∇L(t1,t2,�)

��

)
= (0, 0, 0) , we 

get the necessary conditions −� = b1t1 + b2�2i�
(

t2

�2

) and 

−� = b1�1i�
(

t1

�1

) + b2t2 or equivalently

and

Numerical solutions 
(
to
1
, to
2

)
 for Eqs. 41 and 42 can be 

obtained using mathematical software such as Mathematica, 
from which we get �o

1
= �

(
to
1

�1

)
 and �o

2
= �

(
to
2

�1

)
 and 

ΔGHyb

(
to
1
, to
2
,�
)
 . This maximum can be compared with the 

selection response ΔGHyb

(
te
1
, te
2
,�
)
 for te

1
= �1Φ

−1
�
1 −

√
�H

�
 

and te
2
= �2Φ

−1
�
1 −

√
�H

�
 , i.e., when an equal proportion 

�e =
√
�H of lines is selected for GCA in each parent popu-

lation. For selection based on BLUPs, where b1 = b2 = 1 , 
Eq. 40 simplifies to

which corresponds to the difference in the selection differ-
entials in Π1 and Π2.

Appendix 4: Regression equation of true 
genetic values (TGVs) on their BLUPs

We consider the ordinary mixed linear model studied by 
Henderson (1975)

 where y is an n × 1 observation vector, X a known n × p 
matrix with full column rank p , � an unknown fixed vector, 
and Z is a known n × q matrix. u and e are unobservable 
random vectors with null means and

 where G and R are both nonsingular and known matrices.
Let û be the BLUP of u calculated as described by Hen-

derson (1975) and denote Σ22 = var(û) and Σ12 = cov(u, û) . 

(40)
�L

(
t1, t2�

)

��
= �

(
t1

�1

)
�

(
t2

�2

)
− �H

(41)
b1t1 − b2t2 = b1�1i�

(

t1
�1

) − b2�2i�
(

t2
�2

)

= ΔG1
(

t1, �1, b1
)

− ΔG1
(

t1, �1, b1
)

(42)�

(
t1

�1

)
�

(
t2

�2

)
= �H

(43)t1 − t2 = �1i�
(

t1

�1

) − �2i�
(

t2

�2

)

y = X� + Zu + e,

var

[
u

e

]
=

[
G 0

0 R

]
,

Then, according to Henderson (1975), Σ22 = Σ12 so that we 
have Σ12Σ

−1
22

= I , if �22 can be inverted,

In the case, where 
[
u

e

]
 follows a multivariate normal 

distribution so that û also follows a normal distribution, 
the regression function of u on û is linear (Anderson 1958, 
p. 29) and can be expressed as �12�

−1
22
û = Iû.

Supplementary Information The online version contains supplemen-
tary material available at https:// doi. org/ 10. 1007/ s00122- 024- 04592-2.

Acknowledgements The authors are indebted to Prof. Daniel Gianola 
for critical reading and helpful suggestions on an earlier version of the 
manuscript. ChatGPT 3.5 by OpenAI has been used to improve the 
grammar and style of this paper.

Authors contribution statement AEM conceived the study and devel-
oped the theory. AEM and AJM developed jointly the Mathematica 
programs and the figures. AEM wrote the manuscript with support 
from RF and CCS. All authors discussed and interpreted the results, 
read and approved the final manuscript.

Funding Open Access funding enabled and organized by Projekt 
DEAL. This work was funded by intra-mural funds of the Techni-
cal University of Munich. Open access was enabled and organized by 
Projekt Deal.

Declarations 

Conflict of interest The authors declare that they have no conflict of 
interest. AEM is editor-in-chief and CCS is member of the editorial 
board of Theor. Appl. Genetics.

Ethical standard The authors declare that their work complies with the 
current laws of Germany.

Open Access This article is licensed under a Creative Commons Attri-
bution 4.0 International License, which permits use, sharing, adapta-
tion, distribution and reproduction in any medium or format, as long 
as you give appropriate credit to the original author(s) and the source, 
provide a link to the Creative Commons licence, and indicate if changes 
were made. The images or other third party material in this article are 
included in the article’s Creative Commons licence, unless indicated 
otherwise in a credit line to the material. If material is not included in 
the article’s Creative Commons licence and your intended use is not 
permitted by statutory regulation or exceeds the permitted use, you will 
need to obtain permission directly from the copyright holder. To view a 
copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

References

Anderson TW (1958) An introduction to multivariate statistical analy-
sis. Wiley, New York

Auinger H-J, Lehermeier C, Gianola D, Mayer M, Melchinger AE, da 
Silva S, Knaak C, Ouzunova M, Schön C-C (2021) Calibration and 
validation of predicted genomic breeding values in an advanced 
cycle maize population. Theor Appl Genet 134:3069–3081

Barbosa PAM, Fritsche-Neto R, Andrade MC, Petroli CD, Burgueño J, 
Galli G, Willcox MC, Sonder K, Vidal-Martínez VA, Sifuentes-
Ibarra E (2021) Introgression of maize diversity for drought 

https://doi.org/10.1007/s00122-024-04592-2
http://creativecommons.org/licenses/by/4.0/


Theoretical and Applied Genetics (2024) 137:104 Page 17 of 18 104

tolerance: subtropical maize landraces as source of new positive 
variants. Front Plant Sci 12:691211

Bernardo R (1994) Prediction of maize single-cross performance using 
RFLPs and information from related hybrids. Crop Sci 34:20–25

Bernardo R (1996) Best linear unbiased prediction of maize single-
cross performance. Crop Sci 36:50–56

Bernardo R (2002) Breeding for quantitative traits in plants. Stemma 
Press, Woodbury

Böhm J, Schipprack W, Utz HF, Melchinger AE (2017) Tapping the 
genetic diversity of landraces in allogamous crops with doubled 
haploid lines: a case study from European flint maize. Theor Appl 
Genet 130:861–873

Bonnett D, Li Y, Crossa J, Dreisigacker S, Basnet B, Pérez-Rodríguez 
P, Alvarado G, Jannink J-L, Poland J, Sorrells M (2022) Response 
to early generation genomic selection for yield in wheat. Front 
Plant Sci 12:718611

Brauner PC, Müller D, Molenaar WS, Melchinger AE (2019) Genomic 
prediction with multiple biparental families. Theor Appl Genet 
133:133–147

Brotherstone S, Hill W (1986) Heterogeneity of variance amongst 
herds for milk production. Anim Sci 42:297–303

Bulmer MG (1980) The mathematical theory of quantitative genetics. 
Clarendon Press, New York

Chaikam V, Molenaar W, Melchinger AE, Boddupalli PM (2019) Dou-
bled haploid technology for line development in maize: technical 
advances and prospects. Theor Appl Genet 132:3227–3243

Clark SA, Hickey JM, Daetwyler HD, van der Werf JH (2012) The 
importance of information on relatives for the prediction of 
genomic breeding values and the implications for the makeup of 
reference data sets in livestock breeding schemes. Genet Sel Evol 
44:1–9

Cochran W (1951) Improvement by means of selection. In: Proceedings 
of the second Berkeley symposium on mathematical statistics and 
probability, pp 449–470

Daetwyler HD, Villanueva B, Bijma P, Woolliams JA (2007) Inbreed-
ing in genome-wide selection. J Anim Breed Genet 124:369–376

Falconer D, Mackay T (1996) Introduction to quantitative genetics. 
Longman Group, Essex

Fernando R, Gianola D (1986) Optimal properties of the conditional 
mean as a selection criterion. Theor Appl Genet 72:822–825

Garrick D, Van Vleck LD (1987) Aspects of selection for performance 
in several environments with heterogeneous variances. J Anim 
Sci 65:409–421

Gaynor RC, Gorjanc G, Hickey JM (2021) AlphaSimR: an R package 
for breeding program simulations. G3 11:jkaa017

Goffinet B (1983) Selection on selected records. Génét Sélect Évol 
15:91–98

Habier D, Fernando RL, Dekkers J (2007) The impact of genetic rela-
tionship information on genome-assisted breeding values. Genet-
ics 177:2389–2397

Hartl DL, Clark AG, Clark AG (1997) Principles of population genet-
ics. Sinauer Associates, Sunderland

Henderson CR (1975) Best linear unbiased estimation and prediction 
under a selection model. Biometrics 31:423–447

Henderson C (1990) Statistical methods in animal improvement: his-
torical overview. In: Advances in statistical methods for genetic 
improvement of livestock. Springer, pp 2–14

Hill W (1984) On selection among groups with heterogeneous vari-
ance. Anim Sci 39:473–477

Hölker AC, Mayer M, Presterl T, Bolduan T, Bauer E, Ordas B, 
Brauner PC, Ouzunova M, Melchinger AE, Schön C-C (2019) 
European maize landraces made accessible for plant breeding and 
genome-based studies. Theor Appl Genet 132:3333–3345

Kennedy B, Sorenson D (1988) Properties of mixed model methods 
for prediction of genetic merit under different genetic models in 

selected and nonselected populations. In: Second international 
conference on quantitative genetics, Raleigh. Sinauer Associates, 
pp 47–56

Lehermeier C, Krämer N, Bauer E, Bauland C, Camisan C, Campo L, 
Flament P, Melchinger AE, Menz M, Meyer N (2014) Usefulness 
of multiparental populations of maize (Zea mays L.) for genome-
based prediction. Genetics 198:3–16

Lian L, Jacobson A, Zhong S, Bernardo R (2014) Genomewide predic-
tion accuracy within 969 maize biparental populations. Crop Sci 
54:1514–1522

Lynch M, Walsh B (1998) Genetics and analysis of quantitative traits. 
Sinauer, Sunderland

Mayer M, Unterseer S, Bauer E, de Leon N, Ordas B, Schön CC (2017) 
Is there an optimum level of diversity in utilization of genetic 
resources? Theor Appl Genet 130:2283–2295

Melchinger AE, Fernando R, Stricker C, Schön CC, Auinger HJ (2023) 
Genomic prediction in hybrid breeding: I. Optimizing the training 
set design. Theor  Appl  Genet 136:176

Melchinger AE, Frisch M (2023) Genomic prediction in hybrid breed-
ing: II. Reciprocal recurrent genomic selection with full-sib and 
half-sib families. Theor Appl Genet 136:203

Melchinger AE, Posselt UK (2013) Biotechnologie und Züchtung. In: 
Lütke-Entrup NS, Schwarz FJ, Heilmann H (eds) Handbuch Mais. 
DLG Verlag, Frankfurt, M, pp 53–64

Piepho H, Möhring J, Melchinger A, Büchse A (2008) BLUP for phe-
notypic selection in plant breeding and variety testing. Euphytica 
161:209–228

Rasheed A, Hao Y, Xia X, Khan A, Xu Y, Varshney RK, He Z (2017) 
Crop breeding chips and genotyping platforms: progress, chal-
lenges, and perspectives. Mol Plant 10:1047–1064

Riedelsheimer C, Melchinger AE (2013) Optimizing the allocation of 
resources for genomic selection in one breeding cycle. Theor Appl 
Genet 126:2835–2848

Riedelsheimer C, Endelman JB, Stange M, Sorrells ME, Jannink J-L, 
Melchinger AE (2013) Genomic predictability of interconnected 
Bi-parental maize populations. Genetics 194:493–503

Robert P, Auzanneau J, Goudemand E, Oury F-X, Rolland B, Heumez 
E, Bouchet S, Le Gouis J, Rincent R (2022) Phenomic selection 
in wheat breeding: identification and optimisation of factors influ-
encing prediction accuracy and comparison to genomic selection. 
Theor Appl Genet 135:895–914

Schnell F (1982) A synoptic study of the methods and categories of 
plant breeding

Schrag T, Melchinger A, Sørensen A, Frisch M (2006) Prediction of 
single-cross hybrid performance for grain yield and grain dry mat-
ter content in maize using AFLP markers associated with QTL. 
Theor Appl Genet 113:1037–1047

Schrag TA, Westhues M, Schipprack W, Seifert F, Thiemann A, 
Scholten S, Melchinger AE (2018) Beyond genomic prediction: 
combining different types of omics data can improve prediction of 
hybrid performance in maize. Genetics 208:1373–1385

Seifert F, Thiemann A, Schrag TA, Rybka D, Melchinger AE, Frisch 
M, Scholten S (2018) Small RNA-based prediction of hybrid per-
formance in maize. BMC Genom 19:1–14

Seye A, Bauland C, Charcosset A, Moreau L (2020) Revisiting hybrid 
breeding designs using genomic predictions: simulations high-
light the superiority of incomplete factorials between segregating 
families over topcross designs. Theor Appl Genet 133:1995–2010

Sorenson D, Gianola D (2004) Likelihood, bayesian, and MCMC meth-
ods in quantitative genetics. Springer, New York

Technow F, Schrag TA, Schipprack W, Bauer E, Simianer H, Melch-
inger AE (2014) Genome properties and prospects of genomic 
prediction of hybrid performance in a breeding program of maize. 
Genetics 197:1343–1355


 Theoretical and Applied Genetics (2024) 137:104104 Page 18 of 18

Watson A, Ghosh S, Williams MJ, Cuddy WS, Simmonds J, Rey M-D, 
Asyraf Md, Hatta M, Hinchliffe A, Steed A, Reynolds D (2018) 
Speed breeding is a powerful tool to accelerate crop research and 
breeding. Nat Plants 4:23–29

Weiß TM, Zhu X, Leiser WL, Li D, Liu W, Schipprack W, Melchinger 
AE, Hahn V, Würschum T (2022) Unraveling the potential of 
phenomic selection within and among diverse breeding material 
of maize (Zea mays L.). G3 12:jkab445

Westhues M, Schrag TA, Heuer C, Thaller G, Utz HF, Schipprack W, 
Thiemann A, Seifert F, Ehret A, Schlereth A (2017) Omics-based 
hybrid prediction in maize. Theor Appl Genet 130:1927–1939

Westhues M, Heuer C, Thaller G, Fernando R, Melchinger AE (2019) 
Efficient genetic value prediction using incomplete omics data. 
Theor Appl Genet 132:1211–1222

Wilde P, Menzel J, Schmiedchen B (2003) Estimation of general and 
specific combining ability variances and their implications on 
hybrid rye breeding. Plant Breed Seed Sci 47:89–98

Wilde P, Miedaner T (2021) Hybrid rye breeding. In: The rye genome, 
pp 13–41

Wolfram S (1999) The MATHEMATICA® book, version 4. Cam-
bridge University Press, Cambridge

Woolliams J, Berg P, Dagnachew B, Meuwissen T (2015) Genetic con-
tributions and their optimization. J Anim Breed Genet 132:89–99

Zenke-Philippi C, Frisch M, Thiemann A, Seifert F, Schrag T, Melch-
inger AE, Scholten S, Herzog E (2017) Transcriptome-based pre-
diction of hybrid performance with unbalanced data from a maize 
breeding programme. Plant Breed 136:331–337

Publisher's Note Springer Nature remains neutral with regard to 
jurisdictional claims in published maps and institutional affiliations.


	Optimizing selection based on BLUPs or BLUEs in multiple sets of genotypes differing in their population parameters
	Abstract
	Key message 
	Abstract 

	Introduction
	Theory
	Maximizing the total selection response by optimal choice of thresholds
	Application to selection based on BLUPs
	Application to selection based on BLUEs

	Numerical analyses
	Software availability statement
	Results
	Discussion
	Examples of sets differing in population parameters
	Contrasting BLUEs and BLUPs as selection criteria
	Properties of BLUPs for selection
	Composition of the selected fraction
	Optimal selection of parent lines in hybrid breeding

	Conclusions
	Appendix 1: Response to truncation selection in two sets and composition of the selected set
	Appendix 2: Maximizing selection response by optimal choice of selection thresholds
	Appendix 3: Maximizing the total selection response for factorials in hybrid breeding
	Appendix 4: Regression equation of true genetic values (TGVs) on their BLUPs
	Acknowledgements 
	References