Vol.:(0123456789) Theoretical and Applied Genetics (2024) 137:104 https://doi.org/10.1007/s00122-024-04592-2 ORIGINAL ARTICLE Optimizing selection based on BLUPs or BLUEs in multiple sets of genotypes differing in their population parameters Albrecht E. Melchinger1,2  · Rohan Fernando3  · Andreas J. Melchinger4 · Chris‑Carolin Schön1 Received: 27 October 2023 / Accepted: 5 March 2024 / Published online: 15 April 2024 © The Author(s) 2024 Abstract Key message Selection response in truncation selection across multiple sets of candidates hinges on their post-selection proportions, which can deviate grossly from their initial proportions. For BLUPs, using a uniform threshold for all candidates maximizes the selection response, irrespective of differences in population parameters. Abstract Plant breeding programs typically involve multiple families from either the same or different populations, varying in means, genetic variances and prediction accuracy of BLUPs or BLUEs for true genetic values (TGVs) of candidates. We extend the classical breeder's equation for truncation selection from single to multiple sets of genotypes, indicating that the expected overall selection response (ΔG Tot ) for TGVs depends on the selection response within individual sets and their post-selection proportions. For BLUEs, we show that maximizing ΔG Tot requires thresholds optimally tailored for each set, contingent on their population parameters. For BLUPs, we prove that ΔG Tot is maximized by applying a uniform threshold across all candidates from all sets. We provide explicit formulas for the origin of the selected candidates from different sets and show that their proportions before and after selection can differ substantially, especially for sets with inferior properties and low proportion. We discuss implications of these results for (a) optimum allocation of resources to training and predic- tion sets and (b) the need to counteract narrowing the genetic variation under genomic selection. For genomic selection of hybrids based on BLUPs of GCA of their parent lines, selecting distinct proportions in the two parent populations can be advantageous, if these differ substantially in the variance and/or prediction accuracy of GCA. Our study sheds light on the complex interplay of selection thresholds and population parameters for the selection response in plant breeding programs, offering insights into the effective resource management and prudent application of genomic selection for improved crop development. Introduction Selection is one of the major drivers of evolution and breed- ing. In nature, various types of selection occur, which are studied in evolutionary biology and described in textbooks on population genetics (e.g., Hartl et al. 1997). In breeding, directional selection is by far the most important type of selection in the sense that breeders typically select only a certain number or proportion of top candidates for a single trait or an index of the most important traits. The selected candidates are then advanced for further breeding or utilized as experimental cultivars for commercial purposes. Cochran (1951) derived the primary mathematical results for the changes in population parameters under truncation selection in a seminal paper and demonstrated its application to plant selection. He described the selection response for a target variable, when selection is based on correlated vari- ates. Cochran’s formula and its extension to the peculiarities Communicated by Antonio Augusto Franco Garcia. * Albrecht E. Melchinger albrechtmelchinger@gmail.com 1 Plant Breeding, TUM School of Life Sciences, Technical University of Munich, 85354 Freising, Germany 2 Institute of Plant Breeding, Seed Science and Population Genetics, University of Hohenheim, 70599 Stuttgart, Germany 3 Department of Animal Science, Iowa State University, Ames, IA 50011, USA 4 Department of Mathematics, University of Stuttgart, 70569 Stuttgart, Germany http://crossmark.crossref.org/dialog/?doi=10.1007/s00122-024-04592-2&domain=pdf http://orcid.org/0000-0003-0810-873X http://orcid.org/0000-0001-5821-099X http://orcid.org/0000-0001-6067-7900 Theoretical and Applied Genetics (2024) 137:104104 Page 2 of 18 in plant breeding, such as length of the breeding cycle and parental control, are known as the breeders’ equation (cf. Bernardo 2002; Lynch and Walsh 1998). This equation is one of the most important contributions of quantitative genetics to practical breeding as it quantifies the relevant factors that determine the progress expected from directional selection. However, the breeders’ equation strictly applies only to selection in a single population and assumes homo- geneous correlation between the true genetic value (TGV) and the selection criterion (SC) for all candidates, which is generally not met in practice. More general settings, drop- ping the latter assumption, were investigated by Bulmer (1980). In animal breeding, the problem of heterogeneity of vari- ances among sets was early addressed in the context of dif- ferent environmental groups (Brotherstone and Hill 1986). Hill (1984) found that under more intense selection, more animals are selected from the group with larger variance and recommended to correct for heterogeneity. For selection based on BLUPs, Garrick and Van Vleck (1987) examined the case of heterogeneous variances and showed that selec- tion assuming homogeneity is still highly efficient if the pre- diction accuracy is high. Plant breeding programs typically involve multiple sets of candidates from various families or populations (e.g., Auinger et al. 2021; Lian et al. 2014) and breeders often apply the same threshold to all candidates without consider- ing their origin. However, if the sets differ in their mean and/ or genetic variance and/or heritability ( h2 ) of entry means, calculated as best linear unbiased estimates (BLUEs) in phenotypic selection, this may be suboptimal for the selec- tion response of the entire program. This problem arises for example when one set of candidates is tested in more locations and/or years than another set, resulting in different heritabilities ( h2). When selection is based on best linear unbiased predictors (BLUPs) calculated from pedigree or “omics” data, there are numerous cases in which candidates differ in their popula- tion parameters, most notably the prediction accuracy (ρ) for the TGVs. In genomic selection, ρ strongly depends on the size of the training set and its relationship to the predic- tion set (e.g., Auinger et al. 2021; Clark et al. 2012; Habier et al. 2007). As demonstrated by experimental studies and simulations, adding more half-sibs to full-sibs in the training set improves ρ for genomic prediction within full-sib fami- lies (Brauner et al. 2019; Lehermeier et al. 2014; Lian et al. 2014; Riedelsheimer et al. 2013). Additionally, if pedigree, genomic, metabolic, or transcriptomic data are collected for different sets, the prediction accuracy of BLUPs calculated from different "omics" features or combinations of them can vary significantly among candidates (Seifert et al. 2018; Westhues et al. 2017; Zenke-Philippi et al. 2017). The same holds true for recently proposed approaches of phenomic selection based on sensor data and NIRS measurements (Robert et al. 2022; Weiß et al. 2022). Thus, breeders should be aware of the consequences of different prediction accura- cies for the composition of the selected candidates. A related, albeit slightly distinct scenario unfolds in hybrid breeding. Typically, lines from two genetically distant parent populations are selected based on predictors of their general combining ability (GCA) to attain a high selection response in the predicted hybrids (Melchinger et al. 2023). In general, breeders select an equal proportion of lines from each parent population for producing a factorial of hybrids among them (Melchinger and Posselt 2013). However, this approach may not be optimal if the two parent populations differ in their GCA variances and/or prediction accuracy for GCA effects. To our knowledge, no research has addressed the determination of the optimal proportion of lines to be selected from each parent population under this scenario. The main objective of this study was to quantify and ana- lyze the expected selection response when applying trunca- tion selection to candidates from two sets differing in their population parameters. First, we extend Cochran’s formula for determining how the selection response in the combined set and the composition of the selected fraction depends on the proportion and selection response of the individual sets. Second, we derive solutions to determine the threshold, or equivalently the selected proportion, in each set to maximize the selection response in the combined set and examine the implications for selection based on BLUPs or BLUEs. Third, we explore how to optimize the selection response in hybrid breeding if the female and male parent lines of a complete factorial are selected based on their predicted GCA and the two parent populations differ in the variance and/or predic- tion accuracy of GCA. We augment our theoretical findings with numerical calculations that assess the benefits of uti- lizing optimal selected proportions and their impact on the composition of the selected set. Theory The results in this section are given for two sets of genotypes Π1 and Π2 that can originate from the same or different popu- lations, but they can be extended to any number of sets. The two disjoint sets Π1 and Π2 can be of unequal size with pro- portions �1 and �2 = 1 − �1 , respectively, in the combined set Π1 ∪ Π2 . We assume that the SC for the candidates from Π1 or Π2 is identically independently distributed according to normal distributions N ( �1, � 2 1 ) and N ( �2, � 2 2 ) , respec- tively. Under these assumptions, applying truncation selec- tion with threshold t1 and t2 to the candidates in Π1 and Π2 corresponds directly to selecting proportions � ( t1−�1 �1 ) and � ( t2−�2 �2 ) , respectively. Here, �(x) denotes the proportion Theoretical and Applied Genetics (2024) 137:104 Page 3 of 18 104 selected from a normal distribution N(0, 1) using threshold x, and i�(x) represents the corresponding selection intensity. In order to simplify formulas, we will use the abbrevia- tion � = ( �1,�2, �1, �2,�1 ) . Thus, we get for the proportion of candidates selected from Π1 ∪ Π2 using thresholds t1 and t2 (“Appendix 1,” Eq. 17) and for the proportion of candidates from Γ1 and Γ2 in the selected fraction Γ1 ∪ Γ2 (“Appendix 1,” Eq. 18) Assuming the regression coefficient of the SC on the TGV is b1 in Π1 and b2 in Π2 , and applying the breeders’ equation for each set, we get for the total selection response of TGVs under truncation selection in Π1 ∪ Π2 with thresh- olds t1 and t2 (“Appendix 1,” Eq. 22) w h e r e ΔG1 (( t1 − �1 ) , �1, b1 ) = b1�1i� ( t1−�1 �1 ) a n d ΔG2 (( t2 − �2 ) , �2, b2 ) = b2�2i� ( t2−�2 �2 ) refer to the selection response in set Π1 and Π2 , respectively. A special situation exists in hybrid breeding with two genetically distant parent populations, where Π1 and Π2 correspond to sets of lines from the seed or pollen parent population, respectively. The TGV refers to the general combining ability (GCA) of each line in cross-combina- tions with the other parent population. Since GCA values are defined as deviations from the overall mean of the hybrid population Π1 × Π2 , we assume that the SC for the GCA of the lines from Π1 and Π2 follows normal distribu- tions with N ( 0, �2 1 ) and N ( 0, �2 2 ) , respectively, and the regression coefficients of the TGV of GCA effects on the SC are b1 and b2 , respectively. In phenotypic selection, the SC is commonly based on the testcross performance of each line evaluated in crosses with one or several tester(s) from the opposite population of the heterotic pattern. In genomic selection, GCA can be predicted from the marker profile of the parent lines and phenotypic data of hybrids in a training set (cf. Bernardo 1996; Technow et al. 2014). (1)�Tot(t1, t2, �) = � ( t1 − �1 �1 ) �1 + � ( t2 − �2 �2 ) �2 (2) �1 ( t1, t2, � ) = � ( t1−�1 �1 ) �1 �Tot ( t1, t2, � ) = | | Γ1|| | | Γ1 ∪ Γ2|| and �2 ( t1, t2, � ) = � ( t2−�2 �2 ) �2 �Tot ( t1, t2, � ) = | | Γ2|| | | Γ1 ∪ Γ2|| = 1 − �1 ( t1, t2, � ) (3) ΔGTot ( t1, t2, �, b1, b2 ) = ΔG1 (( t1 − �1 ) , �1, b1 ) �1 ( t1, t2, � ) + ΔG2 (( t2 − �2 ) , �2, b2 ) �2(t1, t2, �) + �1 [ �1 ( t1, t2, � ) − �1 ] + �2[�2(t1, t2, �) − �2] The lines with highest predicted GCA effects in each par- ent population are generally selected for producing a facto- rial to be phenotyped in the final step of cultivar develop- ment (Melchinger and Posselt 2013). Thus, the selection response ΔGHyb in the complete factorial Γ1 × Γ2 of hybrids, produced by mating set Γ1 of GCA-selected lines from Π1 with set Γ2 of GCA-selected lines from Π2 , com- pared to the factorial Π1 × Π2 among unselected lines, is equal to the sum of the selection response for GCA effects ΔG1(t1, �1, b1) = b1�1i� ( t1 �1 ) plus ΔG2(t2, �2, b2) = b2�2i� ( t2 �2 ) in parent population Π1 and Π2 , respectively, and we have for � = (�1, �2, b1, b2) where �( t1 �1 ) = |Γ1| |Π1| and �( t2 �2 ) = |Γ2| |Π2| is the proportion of selected lines in Π1 and Π2 , respectively. Note that corresponds to the proportion of hybrids in Γ1 × Γ2 selected in silico from the set of all possible hybrids in Π1 × Π2. Maximizing the total selection response by optimal choice of thresholds Depending on the budget and size of the breeding pro- gram, the breeder has restrictions on the total number of genotypes to be selected from the candidates in a given cycle. This applies irrespective of whether the selected candidates are promoted to further testing for cultivar development or recombined to generate new base material for the next breeding cycle in recurrent selection. There- fore, the total proportion of selected candidates ( �T) is typically fixed. Nevertheless, the breeder still has the option to optimize the total selection response in Π1 ∪ Π2 by selecting different proportions of candidates from Π1 and Π2 , respectively, while keeping �Tot(t1, t2, �) , the total proportion of genotypes selected from Π1 ∪ Π2 , fixed. Thus, the goal is to find thresholds t∗ 1 and t∗ 2 , or equivalently selected proportions �∗ 1 = � ( t∗ 1 −�1 �1 ) and �∗ 2 = � ( t∗ 2 −�2 �2 ) , which maximize the total select ion response ΔGTot ( t1, t2, �, b1, b2 ) under the side condition �Tot ( t1, t2, � ) = �T. A solution to this problem can be obtained by applying a Lagrange multiplier approach. Our derivations show that ( t∗ 1 , t∗ 2 ) are obtained as solutions of the following equations in ( t1, t2 ) (“Appendix 2,” Eqs. 30 and 31): (4)ΔGHyb ( t1, t2,� ) = ΔG1(t1, �1, b1) + ΔG2(t2, �2, b2) (5) �Hyb ( t1, t2,� ) = � ( t1 �1 ) × � ( t2 �2 ) = ||Γ1 || ||Π1 || × ||Γ2 || ||Π2 || = ||Γ1 × Γ2 || ||Π1 × Π2 || Theoretical and Applied Genetics (2024) 137:104104 Page 4 of 18 and Solutions ( t∗ 1 , t∗ 2 ) of these equations can be obtained by mathematical software, such as Mathematica (Wolf- ram 1999), and subsequently used to calculate �∗ 1 , �∗ 2 and ΔGTot ( t∗ 1 , t∗ 2 , �, b1, b2 ) . In order to assess the improvement in the total selection response, which can be achieved by applying optimal thresholds ( t∗ 1 , t∗ 2 ) instead of identical thresholds ti 1 = ti 2 for both sets satisfying the side condition �Tot ( ti 1 , ti 2 , � ) = �T in Eq. 7, we suggest using the ratio In hybrid breeding, the breeder is also limited in terms of the number of promising predicted hybrids that can be evaluated in a factorial for product development in the next step of the breeding scheme. Thus, the goal is to find opti- mal proportions �o 1 and �o 2 of candidates from Π1 and Π2, or equivalently optimal thresholds to 1 and to 2 , for selection in Π1 and Π2 , respectively, which maximize the selection response ΔGHyb ( t1, t2,� ) for the factorial produced between the GCA-selected lines. However, instead of using Eq. 7, the side condition takes the form where �H is the fixed proportion of hybrids to be selected for testing in the final stage of hybrid development. A solution to this maximization problem can be found again by applying a Lagrange multiplier approach (“Appendix 3”). Accordingly, thresholds ( to 1 , to 2 ) optimizing ΔGHyb ( t1, t2,� ) in the factorial of hybrids among selected lines are found as solutions ( t1, t2 ) (“Appendix 3,” Eqs. 43) of Eq. 9 and Numerical solutions for ( to 1 , to 2 ) can be obtained by math- ematical software such as Mathematica and subsequently used to calculate the proportions �o 1 = � ( to 1 �1 ) and �o 2 = � ( to 2 �2 ) = �H∕� o 1 to be selected in Π1 and Π2 , respec- tively, and finally, ΔGHyb ( to 1 , to 2 ,� ) . (6) t1 = b2 ( t2 − �2 ) + �2 − �1 b1 + �1 or equivalently t2 = b1 ( t1 − �1 ) + �1 − �2 b2 + �2 (7) �Tot ( t1, t2, � ) = � ( t1 − �1 �1 ) �1 + � ( t2 − �2 �2 )( 1 − �1 ) = �T. (8) ΨTot ( �T, �, b1, b2 ) = 100 × [ ΔGTot ( t∗1 , t ∗ 2 , �, b1, b2 ) − ΔGTot ( ti1, t i 2, �, b1, b2 ) ΔGTot ( ti, ti, �, b1, b2 ) ] . (9)�Hyb ( t1, t2,� ) = � ( t1 �1 ) × � ( t2 �2 ) = �H, (10)b2t2 − b1t1 + ΔG1 ( t1, �1, b1 ) − ΔG2 ( t2, �2, b2 ) = 0. In order to assess the improvement in the total selection response, which can be achieved by using the optimal pro- portions ( �o 1 , �o 2 ) compared to selecting an equal proportion �e = √ �H of lines from each population, i.e., using thresh- olds te 1 = �1Φ −1(1 − �e) and te 2 = �2Φ −1(1 − �e) , we suggest using the ratio Application to selection based on BLUPs Let u denote the random variable of true breeding values (TBVs) and û their BLUPs, obtained by the use of pedigree or “omics” data. As shown by Henderson (1975), the stand- ard deviation �u of TGVs and the standard deviation � of their BLUPs are related by � = ��u , where � is the prediction accuracy, reflecting the shrinkage of BLUPs compared to the TBVs. Hence, we have �1 = �1�u1 and �2 = �2�u2 . Further, the regression of u on û is equal to 1.0 for each set, so that b1 = 1.0 and b2 = 1.0 and this result holds true under fairly general conditions (“Appendix 4”). Thus, from Eq. 6 we obtain t∗ 1 = t∗ 2 , even if �1 ≠ �2 , �2 u1 ≠ �2 u2 , and �1 ≠ �2 . Con- sequently, using identical thresholds for the predicted values of TGVs (calculated as BLUPs plus the mean � of the cor- responding set) maximizes the selection response in the combined set. In conclusion, for BLUPs there is no need to search for the optimal threshold in each set and one must merely find the common threshold t∗ = t∗ 1 = t∗ 2 for both sets satisfying the side condition in Eq. 7, which can be obtained by solving the equation Moreover, the total selection response in the combined set Π1 ∪ Π2 for the common threshold t∗ is Application to selection based on BLUEs In phenotypic selection (PS) based on BLUEs, the regres- sion of TGVs on the SC is equal to their heritability (Fal- coner and Mackay 1996, p. 189), so that b1 = h2 1 and b2 = h2 2 . Further, the standard deviation �u of TBVs and the standard deviation σ of their BLUEs used in PS are related by � = �u h . (11) ΨHyb ( �H,� ) = 100 × [ ΔGHyb ( to 1 , to 2 ,� ) − ΔGHyb ( te 1 , te 2 ,� ) ΔGHyb ( te 1 , te 2 ,� ) ] . (12)Φ ( t∗ − �1 �1 ) �1 + Φ ( t∗ − �2 �2 )( 1 − �1 ) = 1 − �T (13) ΔGTot-BLUP ( t∗, t∗�, 1, 1 ) = 1 �T [�1� ( t∗ − �1 �1 ) �1 + �2� ( t∗ − �2 �2 ) �2 + �1�1 ( � ( t∗ − �1 �1 ) − �T ) + �2�2 ( � ( t∗ − �2 �2 ) − �T ) ] Theoretical and Applied Genetics (2024) 137:104 Page 5 of 18 104 Hence, we have �1 = �u1∕h1 and �2 = �u2∕h2 . Thus, Eq. 3 becomes From Eqs. 6 and 7, the optimal choice of thresholds t∗ 1 and t∗ 2 are obtained as solutions of Numerical analyses All equations in the theory part were programmed in soft- ware Mathematica (Wolfram 1999) for numerical analyses. As a first check for Eq. 6 and the derivations in “Appen- dix 2,” we numerically compared the selection response ΔGTot for BLUPs achieved with optimized thresholds ( t∗ 1 , t∗ 2 ) versus identical ( ti 1 = ti 2 ) thresholds for BLUPs, setting b1 = b2 = 1.0 in our program. Regardless of the means (�1, �2) and standard deviations ( �1, �2) of the SC in Π1 and Π2 , as well as the choice of �1 and �T, the value of ΔGTot obtained for ( t∗ 1 , t∗ 2 ) and ( ti 1 = ti 2 ) were identical except for tiny differences attributable to rounding errors so that ΨTot was practically zero (data not shown), confirming our theo- retical results. For BLUEs, we calculated on one hand the values for ΨTot and �∗ 1 = �1 ( t∗ 1 , t∗ 2 , � ) obtained by using the solutions for ( t∗ 1 , t∗ 2 ) obtained with the Lagrange multiplier approach (Eq. 6). On the other hand, we used Function NMaximize in Mathematica to determine the maximum of ΔGTot under the side condition in Eq. 7. Again, the numerical results from both calculations were in perfect agreement except for numerical inaccuracies. For finding the maximum selection response ΔGHyb in the hybrid population Π1 × Π2, we used function NMaximize in Mathematica in combination with the side condition in Eq. 9 to find the optimum choice of selected proportions(�o 1 , �o 2 ) . These values we used to calculate according to Eq. 8 the percentage improvement ( �Hyb) in ΔGHyb when using opti- mized (�o 1 , �o 2 ) instead of equal ( �e 1 = �e 2 ) proportions of lines selected from population Π1 andΠ2. For investigating the consequences of BLUE-based selec- tion on the magnitude ofΨTot , γ∗1 and � i 1 = �1 ( ti 1 , ti 2 , � ) as a function of other relevant population parameters, we made (14) ΔGTot - PS ( t1, t2, �, h21, h 2 2 ) = h21�1� ( t1 − �1 �1 ) �1 + h22�2� ( t2 − �2 �2 ) �2 + �1 [ �1 ( t1, t2, � ) − �1 ] + �2 [ �2 ( t1, t2, � ) − �2 ] (15) Φ ( t∗1 − � �1 ) �1 + Φ ( h21 ( t∗1 − �1 ) + �1 − �2 h22�2 ) �2 = 1 − �T and t∗2 = h21 ( t∗1 − �1 ) + �1 − �2 h22 + �2 the assumption without loss of generality that �1 = 0 and �u1 = 1.0 . This can be achieved by centering the original SC values of all candidates as deviations from �1 and dividing them by �1 = �u1∕h1 . Moreover, for representing ΨTot and γ∗ 1 or γi 1 in contour plots as functions of �2 andh2 , we assumed h2 1 = √ 0.5 and identical genetic standard deviations in both sets ( �u1 = �u2) , which closely approximates the conditions encountered in many situations in plant breeding programs. Software availability statement The Mathematica programs developed for the numerical analyses of this study are available at https:// github. com/ TUMpl antbr eeding/ AEM/ Opt_ selec tion_ with_ multi ple_ sets and can be downloaded from there. Results Figure 1 examines for BLUPs the shift in the proportion of candidates from Π1 before ( �1) and after selection ( γ∗ 1 ). We present the ratio γ∗ 1 ∶ �1 as a function of �2and�2 under the a s s u m p t i o n s m e n t i o n e d a b o v e ( �1 =0,�2 u1 = �2 u2 = 1.0, �1 = 0.50 ). Regardless of the magnitude of αT and�1 , the contour lines were straight lines, indicating that γ∗ 1 depends on a linear function of �2 and �2 with weights of these parameters determining their slope. For small values of �1 or αT , the ratio reduced substantially with an increasing sum �2 + �2 so that even for moderate values for one of these parameters, the ratio was smaller than 0.1, indicating that less than 10% of the initial proportion �1 was recovered in γ∗ 1 . For �1 = 0.90 in combination αT ≥ 0.10 , the ratio was less affected by increasing �2 and reduced only moderately with increasing �2 , yet the slope of the contour lines changed with �2. As expected, under optimal thresholds ( t∗ 1 , t∗ 2 ) for selection based on BLUEs, the contour plots for γ∗ 1 ∶ �1 were identical to those obtained for BLUPs, when replacing �1 by h1 = √ h2 1 and �2 byh2 = √ h2 2 , respectively (results not shown). For c o m p a r i s o n , we a l s o a n a ly z e d t h e r a t i o γi 1 ∶ �1 as a function of �2 and h2 to monitor the relative change in the proportion of candidates from Π1 before ( �1) and after selection ( γi 1 ) based on BLUEs with identical thresholds ( ti 1 = ti 2 ) for both sets (Supplementary Figure 1). Compared with γ∗ 1 ∶ �1 , the ratio γi 1 ∶ �1 changed less with increasing �2 and h2 , particularly for large values of �1 or αT . The ratio depended mainly on the magnitude of h2 and less on the size of �2 . For �1 ≤ 0.50 and αT ≥ 0.10 , the ratio was smaller or larger than 1.0 if h2 falls below or exceeds h1 , https://github.com/TUMplantbreeding/AEM/Opt_selection_with_multiple_sets https://github.com/TUMplantbreeding/AEM/Opt_selection_with_multiple_sets Theoretical and Applied Genetics (2024) 137:104104 Page 6 of 18 respectively, and increasing �2 had only a moderately reduc- ing effect. When performing mild selection ( αT = 0.25) with BLUEs, the size of ΨTot , reflecting the improvement in overall selec- tion response achieved by using optimal ( t∗ 1 , t∗ 2 ) instead of identical thresholds(ti 1 = ti 2 ), was consistently smaller than 10%, irrespective of �1 and the investigated range of �2 and h2 (Fig. 2). For αT = 0.10 , ΨTot was close to zero for �1=0.1 but exceeded 10% for �1=0.50 and high values of h2 . Under stringent selection with αT = 0.01 and h2 ≥ 0.90 , ΨTot sur- passed 20% for �1=0.50, regardless of �2, or if �1 = 0.50 and h2 ≥ 0.5 . Setting �2 = 1.0 had only a minor effect on increasing ΨTot compared to increasing h2 from √ 0.5 to 0.9. Figure 3 shows ΨHyb, the increase in selection response for hybrids when selecting optimal ( �o 1 , �o 2 ) versus equal ( �1 = �2 = �e = √ �H) proportions of lines from each parent population, as a function of �2 ∶ �1 , the ratio of the standard deviations of BLUPs for GCA effects of lines in Π1 and Π2 . ΨHyb showed an approximately quadratic decrease with increasing the ratio �2 ∶ �1 from 0.5 to 1.0 and minor dif- ferences for different values of �H . For �2 ∶ �1 = 0.5,ΨHyb was approximately 6% for all values of �H . The ratio �o 1 : �e displayed a quadratic decrease with increasing �2 ∶ �1 with large differences depending on �H . For �2 ∶ �1 = 0.5 and �H ≤ 0.01, �o 1 : �e was smaller than 0.25, reflecting that selection of hybrids relied almost entirely on stringent GCA Fig. 1 Contour plots for the ratio�∗ 1 ∶ �1 , indicating the shift in the proportion of genotypes from Π1 before ( �1) and after ( �∗ 1 ) trunca- tion selection based on BLUPs, when using optimal (= identical) thresholds (t∗ 1 = t ∗ 2 ) in set Π1 andΠ2 . The graphs show �∗ 1 ∶ �1 as a function of the mean �2 and the prediction accuracy �2 of the selec- tion criterion (SC) in Π2 for various values of �1 and �T , the propor- tion of candidates selected from Π1 ∪ Π2 . Assumptions are �1=0, �2 u1 = �2 u2 = 1.0, �1 = 0.50, i.e., � =(0, �2, 0.5, �2,�1) . The white labels attached to the contour lines show the corresponding numerical values Theoretical and Applied Genetics (2024) 137:104 Page 7 of 18 104 selection of lines in the parent population with higher vari- ance of BLUPs and only mild selection in the other parent population. Discussion Examples of sets differing in population parameters In all breeding categories described by Schnell (1982), plant breeders generally evaluate and select genotypes from multiple families in parallel as evident from publications on public and private breeding programs in maize and wheat (e.g., Auinger et al. 2021; Bonnett et al. 2022; Lian et al. 2014). The parents of these mostly bi-parental families gen- erally differ in their performance level and relationship, and therefore, the progenies differ with respect to relevant popu- lation parameters. Nevertheless, these materials are routinely evaluated together in the same experiment(s) and genotypes promoted to the next stage of the program are often selected without giving much attention to their origin. Fig. 2 Contour plots for ΨTot ( αT, �, h 2 1 , h2 2 ) , indicating the percentage increase of the selection response ΔGTot for selection based on BLUEs in Π1 ∪ Π2, when using optimal (t∗ 1 , t∗ 2 ) versus identical (t i 1 = t i 2 ) thresholds for truncation selection in set Π1 andΠ2 , respec- tively. The graphs show ΨTot as a function of the mean �2 and h2, the square root of the heritability of the BLUEs in Π2 for various values of �1 and  �T , the proportion of candidates selected from Π1 ∪ Π2 . Assumptions are �1 = 0, �2 u1 = �2 u2 = 1.0, h2 1 = 0.50, i.e., � = (0, �2, √ 2, 1√ h 2 2 ,�1) . The white labels attached to the contour lines show the corresponding numerical values Theoretical and Applied Genetics (2024) 137:104104 Page 8 of 18 A comparable situation exists in introgression breed- ing programs when multiple populations are developed by crossing elite germplasm with various donors (e.g., Bar- bosa et al. 2021). These materials generally differ in their performance level and genetic variance due to disparate adaptation of the donors to the target environment(s) and varying proportions of donor germplasm in the pedigree. In pre-breeding programs too, the differences among popu- lations can be extremely large as reported for landraces of maize (Böhm et al. 2017; Hölker et al. 2019; Mayer et al. 2017). If all populations are evaluated in a common experiment, breeders are inclined to apply the same thresh- old for identifying superior candidates used for further breeding. Even when dealing with a single population, so that the mean and genetic variance are identical, sets of genotypes often differ with regard to the prediction accuracy of the SC for the TGV of candidates. This can be attributable to unbalanced data from multi-environment trials, where some sets are evaluated in fewer environments or replications than others. For instance, top per- formers remain in the testing pipeline for several years, while new entries are added to the system (Piepho et al. 2008). Moreo- ver, some genotypes might be tested less intensively owing to problems in seed multiplication, as occurs in the production of doubled-haploid lines (Chaikam et al. 2019) or in speed breed- ing programs (Watson et al. 2018). Further, when complex traits are monitored using sensor-based techniques (NIRS, optical sensors, etc.) or “omics” data (genomic, phenomic, etc.), the prediction accuracy tends to be notably higher in the calibration set compared to the prediction set (Melchinger and Frisch 2023) and in sets combining different “omics” features (Schrag et al. 2018; Westhues et al. 2019). Thus, there are numerous scenarios where sets of germplasm in a breeding program differ in their population parameters and breeders should be prepared to deal adequately with these situations and be aware of the implica- tions for selection. Contrasting BLUEs and BLUPs as selection criteria Until two decades ago, selection decisions in plant breed- ing relied exclusively on BLUEs of the candidates, a prac- tice that still endures in many smaller breeding programs today. Two major reasons contribute to this conservative attitude. Firstly, for traits with high heritability on an entry- mean basis, the ranking of candidates based on BLUEs and BLUPs is mostly similar. Secondly, calculation of BLUEs is straightforward and does not require information on the relationships among candidates or estimates of genetic vari- ance components, which are challenging to obtain due to the small size of sets and rapid change over selection cycles. Building upon the pioneering research of Henderson (1975) and inspired by the tremendous progress in animal breeding subsequent to the adoption of BLUPs, Bernardo (1994) spearheaded the implementation of BLUPs into plant breeding. With balanced data and when candidates are unrelated or possess identical co-ancestries so that their TGVs are predicted with equal accuracy, the ranking of candidates based on BLUEs and BLUPs is identical (Ken- nedy and Sorenson 1988). Otherwise, BLUPs offer a notable advantage by capitalizing on information from relatives and/ or accommodating an efficient analysis of unbalanced data (Bernardo 2002; Piepho et al. 2008). Another major advantage of BLUPs over BLUEs is their ability to allow direct comparisons across different breed- ing sets, regardless of their origin. As outlined in Eq. 6, applying the same selection threshold to the BLUPs of all Fig. 3 A Percentage increase �Hyb ( �H,� ) of the selection response in the hybrid population Π1 × Π2 and B ratio of the optimal proportion of selected candidates ( �0 1 ) from Π1 versus an equal (�e = √ �H) pro- portion of lines selected from each parent population based on GCA predicted by BLUPs for � = ( �1, �2, 1, 1 ) . The graphs show �Hyb and as function of �2 ∶ �1 , the ratio of standard deviations of BLUPs for GCA of lines in Π2 and Π1 , respectively, for different values of �H , the proportion of hybrids selected from Π1 × Π2 Theoretical and Applied Genetics (2024) 137:104 Page 9 of 18 104 candidates is optimal, whereas for BLUEs distinct thresh- olds must generally be found to maximize the selection response of the entire program. Following Cochran (1951), our theoretical results were derived assuming that the SC and TGVs are independently and identically distributed within each set because otherwise the already complex algebra would become even more unwieldy. This assumes an idealized situation, which is seldom met in practice as data are generally unbalanced and candidates com- monly differ in their relationships. However, considering that the regression function of TGVs on BLUPs remains an identity matrix even under less stringent assumptions Fig. 4 Individual and joint probability density functions (pdf) of the true genetic values (TGV ~ N(0, 1) ) and selection criterion (SC ~ N ( 0, �2 ) for sets Π1 and Π2 with equal proportions ( �1 = �2 = 0.5). � SC are BLUEs with √ h 2 1 = 0.6 and √ h 2 2 = 0.9 in Π1 and Π2, respectively. B SC are BLUPs with �1 = 0.6 and �2 = 0.9 in Π1 and Π2, respectively. In both cases, truncation selection with identical thresholds is practiced in Π1∪ Π2 to achieve �T = 0.1 . SD refers to the selection differential Theoretical and Applied Genetics (2024) 137:104104 Page 10 of 18 (“Appendix 4”), we conjecture that our results for BLUPs hold approximately true across a broad spectrum of sce- narios, but this warrants further research. The difference between BLUEs and BLUPs is illus- trated by two sets Π1 and Π2 with equal proportion ( �1 = 0.5 ) of unrelated candidates sampled from the same population and selection of �T = 0.10 candidates across Π1 ∪ Π2 (Fig. 4). Thus, the two sets share identical means ( �1 = �2 = 0 ) and genet ic s tandard devia t ions ( �u1 = �u2 = 1 ). Regarding the prediction accuracy of the SC, we assume √ h2 1 = �1 = 0.6 for Π1 and √ h2 2 = �2 = 0.9 for Π2 , i.e., these values differ between the two sets but are identical for BLUEs and BLUPs within each set. When using BLUEs, the standard deviation of the SC is l a r g e r i n Π1 c o m p a r e d t o Π2 ( �1 = �u1 h1 = 1.67 vs. �2 = �u2 h2 = 1.11 ) due to the lower herit- ability. Utilizing identical thresholds ( ti 1 = ti 2 = 1.77 ) for �T = 0.10 , a larger proportion of candidates is selected in Π1 than in Π2 ( �1 = 0.14 vs. �2 = 0.06 ), leading to lower selec- tion intensity in Π1 ( i�1 = 1.57 vs. i�2 = 2.02 ). While the s e l e c t i o n d i f f e r e n t i a l s a r e s i m i l a r ( SD1 = 2.62 vs. SD2 = 2.24 ), the selection response almost doubles in Π2 compared to Π1 ( ΔG1 = 0.94 vs. ΔG2 = 1.82 ) owing to the higher heritability. Since the proportion of can- didates selected from Π1 is much larger than it would be with optimal thresholds ( � i 1 = 0.72 vs. �∗ 1 = 0.28 ), this explains w hy fo r B L U E s t h e s e l e c t i o n r e s p o n s e ΔGTot(1.77, 1.77, �, 0.36, 0.81) = 1.19 is significantly smaller t h a n t h e m a x i m u m s e l e c t i o n r e s p o n s e ΔGTot(2.65, 1.18, �, 0.36, 0.81) = 1.36 achieved with optimal thresholds ( t∗ 1 = 2.65, t∗ 2 = 1.18 ) , resulting in �Tot = 14.5%. For very stringent selection with �T = 0.01, we get � i 1 = 0.95. vs. �∗ 1 = 0.05 , leading to �Tot = 42.3%. When using BLUPs as SC, candidates of Π1 exhibit a smaller standard deviation than those of Π2 ( �1 = �1�u1 = 0.6 vs. �2 = �2�u2 = 0.9 ) due to increased shrinkage. Consequently, applying identical thresholds ( t∗ 1 = t∗ 2 = 0.96 ) to both sets for achieving �T = 0.10 leads to a smaller proportion of candidates (�1 = 0.06 vs. �2 = 0.14 ) and a higher selection intensity (i�1 = 2.02 vs. i�2 = 1.57 ) for Π1 compared to Π2 . Given that the regression for TGVs on BLUPs is equal to 1.0 (Henderson 1975), we obtain ΔG1 = 1.21 and ΔG2 = 1.42 . Referring to Eqs. 2 and 13, we get �∗ 1 = 0.28 and ΔGTot(0.96, 0.96, �, 1, 1) = 1.36 . While this example was chosen for simplicity, it underscores the fundamental disparities between BLUEs and BLUPs for selection in scenarios involving multiple sets. Properties of BLUPs for selection BLUPs possess several optimality properties for prediction of random effects in mixed linear models (Fernando and Gianola 1986; Henderson 1990). They have minimum pre- diction error variance and maximize the correlation to the TGVs in the class of linear unbiased predictors. Further- more, when random effects adhere to a normal distribution and fixed effects in the mixed model are known, BLUPs have smallest mean-squared error among all possible predic- tors. Concerning truncation selection, we provided a proof in “Appendix 2” that when dealing with two sets characterized by distinct population parameters (e.g., means, variances and prediction accuracies of TGVs), utilizing a uniform thresh- old for the BLUPs across all candidates maximizes the selec- tion response. We derived this property of BLUPs through a Lagrange multiplier approach, which requires quite restrictive assump- tions on the random effects u in the different sets. It is closely related to a more general selection principle (Fernando and Gianola 1986; Goffinet 1983). Accordingly, if n candidates are available and k < n of them are to be chosen, then select- ing the k candidates with highest conditional mean for an unobservable random variable u maximizes the expected value of the mean of u for the selected candidates. This result holds true independent on the joint distribution of the unobservable random variable u and the data. Under nor- mality, the BLUP of u can be thought of as its conditional mean. Thus, even when the candidates are from different sets, selecting the k candidates with the highest values for BLUPs (û) would maximize the response to selection and no further corrections are needed. In a strict sense, selecting a fixed number k or constant proportion � = k∕n of candidates from a finite population of size n differs from truncation selection. In truncation selec- tion, the threshold is set so that the expected proportion of candidates is equal to � in a population of infinite size. When applying this fixed threshold to a sample of size n , the num- ber of selected candidates may deviate from k . However, as the sample size increases, selecting a fixed number or proportion of candidates becomes equivalent to truncation selection. Therefore, the results derived for truncation selec- tion in this study closely approximate those for selecting a constant proportion of candidates. Our approach for proving the optimality property of BLUPs under truncation selection allows calculating the optimal proportion �∗ 1 and �∗ 2 of candidates selected from set Π1 andΠ2 , respectively, given reliable estimates of the population parameters are available. This information is important for optimizing the allocation of resources in genomic selection based on BLUPs. By knowing �∗ 1 and �∗ 2 in advance, we can calculate the selection response across both the training and prediction set. Thus, we can find the Theoretical and Applied Genetics (2024) 137:104 Page 11 of 18 104 ideal balance between (1) the expenditures allocated to the training set, which determines mainly the prediction accura- cies of both the training and prediction set, and (2) the size of both sets, which determines �T . A thorough examination of this complex problem is beyond the scope of this study and warrants further research. Using the same threshold for BLUPs does not neces- sarily mean that all candidates share an equal likelihood of being selected, even if they possess the same TGV as highlighted in the literature (Woolliams et al. 2015). This can be exemplified by Fig. 4, where for �T = 0.10 the propor- tion of candidates from set Π1 would reduce from �1 = 0.50 before selection to �∗ 1 = � i 1 = 0.28 after selection owing to the lower prediction accuracy for Π1 and increased shrink- age of BLUPs. Composition of the selected fraction Generalizing Cochran’s formula for selection response to the case of multiple sets allowed us to examine the pro- portions ( �1, �2) of selected candidates originating from Π1 andΠ2 . This is of interest for two reasons. First the selection response for the combined set depends on a weighted sum- mation of the selection response in each set (Eq. 3), with weights corresponding to the post-selection fractions �1 and �2 . Second, the makeup of the selected fraction is criti- cal for further breeding progress, given that these candidates are used either directly for product development and/or for generating the base materials of the next breeding cycle. In extreme cases, ΔGTot for BLUEs can even be negative. For instance, if only mild selection ( �1 = 0.45) is applied to the inferior, smaller population Π1(�1=0.2, �1 = 0 , �1 = 1, h2 1 = 0.36) but stringent selection ( �2 = 0.0125) is applied to Π2 (�2 = 2.0 , �2 = 2 and h2 2 = 0.81) so that �1 = 0.90 is much larger than �1 , the outcome would beΔGTot = −0.70. Here, we focus our discussion on the composition of the selected fraction obtained through the use of BLUPs with a uniform threshold for all candidates. As indicated by the graphs in Fig. 1, the change in the composition of the can- didates before and after selection, expressed by the ratio �∗ 1 : �1 , can be striking. For instance, when �1 = 0.10 and/or �T = 0.01, the proportion retained from the inferior set Π1 dwindles to less than 10% of its original proportion, if �2 surpasses �1 by about one genetic standard deviation under otherwise identical conditions. Consequently, if materials from introgression programs are evaluated together with elite germplasm and the same threshold is applied to the BLUPs of both groups, hardly any novel germplasm will be selected due to its low performance level. Thus, it would be prudent to apply different thresholds for both groups to have a realistic chance that some of the promising new genotypes are retained for further breeding. Likewise, the ratio�∗ 1 : �1 falls below 0.20, if two sets share equal size and population parameters, yet �2 ≥ 0.68 while �1 = 0.50 . Differences of this magnitude have been observed in genomic prediction of maize hybrids, in which case the prediction accuracy significantly decreased from H2 hybrids, where both parents are used as parents of a hybrid in the training set, to H1 and H0 hybrids, where only one or none of the parent lines, respectively, contribute to a hybrid in the training set (Seye et al. 2020; Technow et al. 2014; West- hues et al. 2017). While it seems rewarding to have a much larger number of H0 hybrids than H1 and H2 hybrids due to their lower costs (involving only production and genotyp- ing of parent lines), the contribution of H0 hybrids to the overall selection response is generally overrated because their selected proportion is much smaller than for H2 and H1 hybrids owing to the lower prediction accuracy. Con- sequently, H0 hybrids contribute significantly less to the selection response than expected based on their proportion in the entire set of predictable hybrids. This aspect is crucial when optimizing the distribution of resources allocated to the training and prediction sets (Riedelsheimer and Melch- inger 2013). There are many further examples, where sets differ in their prediction accuracy because they differ in the num- ber of close relatives in the training set. For this reason, genotypes in the training set have generally a significantly higher prediction accuracy than those in the prediction set, leading to a notable underrepresentation of the latter in the selected set. Likewise, in recycling breeding breeders typi- cally generate more and larger families from crosses of elite parents. If the training set is sampled proportional to the size of these families, it follows that genotypes descending from the top parents have higher prediction accuracy due to more and closer relatives in the training set than geno- types descending from less prominent parents. Thus, on top of the expected high TGVs of these progenies, the smaller shrinkage of their BLUPs further increases the likelihood that they are selected. However, this carries a high risk of selecting closely related genotypes descending from a small number of top ancestors, thereby diminishing the effective population size and long-term progress in genomic selection, particularly when applying rigorous selection pressure ena- bled by the low costs for genotyping with modern methods (Rasheed et al. 2017). While our focus has been primarily on diverse prediction accuracies, our conclusions can be extended to scenarios where sets differ in genetic variances. Optimal selection of parent lines in hybrid breeding In hybrid breeding, breeders typically work with a compara- ble number of lines from each parent population and select, based on GCA predicted by BLUPs, a proportional number of candidates from both groups for the final testing phase in Theoretical and Applied Genetics (2024) 137:104104 Page 12 of 18 product development (Melchinger and Frisch 2023). Fig- ure 3 shows that this approach is optimal when the parent populations exhibit similar variances for the SC, but this is not always the case in practice. In European maize for example, GCA variance for grain yield was approximately twice as large for dent lines compared to flint lines (Schrag et al. 2006). Similarly in hybrid rye, Wilde et al. (2003) found that GCA variance for grain yield among female lines from the Petkus pool was almost four times greater than observed among male lines from the Carsten pool. Addi- tionally, the accuracy of predicted GCA effects can differ between the parent populations due to differences in the size and intensity of phenotyping of the training set and the use of different types of testers. Furthermore, in species like rye, where the implementation of CMS for testcross seed production differs significantly between the seed and pollen parent pools, the pedigree relationship between candidates in the prediction and training set can diverge (Wilde and Miedaner 2021). Under these scenarios, a notable enhancement �Hyb in selection response for predicted hybrids, compared to select- ing equal proportions in each population, can be achieved by opting for more stringent selection within the parent popu- lation exhibiting the larger GCA variance. The magnitude of �Hyb depends strongly on the ratio of GCA variances in the two parent populations but showed similar curves inde- pendent of the selected proportions �H (Fig. 3). Under mild selection ( �H = 0.25), the optimal α-values for the two parent populations hardly differ from each other, but for stringent selection ( �H = 0.0001), a much more stringent selection must be practiced in the parent population with larger GCA than smaller GCA variance, as reflected by the low ratio �0 1 :�e , where �e = √ �H . As an alternative to selecting parent lines based on their predicted GCA for producing a complete factorial of hybrids, one could directly select the most prom- ising hybrids based on the sum of the GCA of their parents. This would result in selecting a partial factorial having the form of a triangle, with the top parents being involved in more crosses than parents with lower rank and automatically takes care of differences in the GCA variance of BLUPs for each parent population. A comparison of these two selection schemes would be highly interesting for hybrid breeding but is beyond the scope of this study. Conclusions When practicing truncation selection with candidates from multiple sets, new aspects must be taken into consideration as compared to selection in a single homogeneous popula- tion. This is because selection progress in the entire breeding program depends not only on the selection response in each set but also on the composition of the selected fraction. A major question is how to choose the thresholds for candidates from the various sets for maximizing the selection response of the entire breeding program. In addition to the numerous advan- tages of BLUPs compared to BLUEs, they have the highly desirable property that a uniform threshold can be applied to all candidates for maximizing the selection response and no further adjustment for differences in the reliability of the predictors is necessary. This applies even if the sets differ in the population parameters and/or if BLUPs of different can- didates are calculated from different types or combinations of "omics" data and simplifies selection decisions. However, calculation of BLUPs requires reliable estimates of the genetic variance, which is a challenge with the small sample sizes of families used in plant breeding, but this problem has been mitigated with the use of Bayesian methods (Sorenson and Gianola 2004). Since variation in the prediction accuracy can have a strong impact on the outcome of the selected fraction and strongly reduces the effective population size under the stringent selec- tion, we recommend to accompany genomic selection based on BLUPs with monitoring the genetic diversity of the selected candidates. Ideally, genomic selection could be combined with optimum contribution selection (Daetwyler et al. 2007; Gaynor et al. 2021; Woolliams et al. 2015), where the relationship of candidates is determined from genomic data. In genomic selection of hybrids based on predicted values of GCA of their parents, we suggest to select different propor- tions in the two parent populations, if these differ substantially in their population parameters such as the GCA variances and/ or prediction accuracy of GCA effects. Appendix 1: Response to truncation selection in two sets and composition of the selected set In our notation, we use �(x) and Φ(x) to denote the proba- bility density function and cumulative distribution function of the standard normal distribution N(0, 1) , respectively; �(x) = 1 − Φ(x) and i�(x) denote the selected proportion and selection intensity, respectively, when applying threshold x to a standard normal distribution N(0, 1). Our assumptions for the mathematical derivations are: 1. Z is a Bernoulli distributed variable that indicates the origin of a candidate C from set Π1 or Π2 , where Z = 1 with probability �1 = |Π1| |Π1∪Π2| for C ∈ Π1 and Z = 2 with probability �2 = 1 − �1 for C ∈ Π2. 2. X is the random variable for the SC with a condi- tional distribution X|Z=1 ∼ N ( �1, � 2 1 ) for C ∈ Π1 and X|Z=2 ∼ N ( �2, � 2 2 ) for candidates C ∈ Π2. 3. Candidates C from set Π1 and Π2 are selected based on their SC surpassing the respective thresholds t1 and Theoretical and Applied Genetics (2024) 137:104 Page 13 of 18 104 t2 , yielding the sets Γ1 and Γ2 of selected candidates. Choosing thresholds t1 and t2 is equivalent to selecting proportions �1 and �2 of top candidates from Π1 and Π2, respectively, where Our subsequent derivations are based on thresholds as deal- ing with proportions would further complicate the already complex algebra. Applying the theorem of total probability and defining � = ( �1,�2, �1, �2,�1,�2 ) , we obtain for the total proportion of candidates selected fromΠ1 ∪ Π2: Using Bayes’ formula, the proportion of candidates from Π1 in the entire set Γ1 ∪ Γ2 of selected candidates is The expectation of the SC for the candidates in Γ1 and Γ2 selected from Π1 and Π2 , respectively, is where �i � ( t−� � ) is the selection differential under truncation selection with threshold t in a normal distribution N ( �, �2 ) , and i�(x) is the selection intensity (Falconer and Mackay 1996, p. 189). Thus, we obtain the expectation of the selected candidates in Γ1 ∪ Γ2 as (16) �1 = P [ X > t1||C ∈ Π1 ] = P [ X|Z=1 > t1 ] = [ 1 − Φ ( t1 − �1 �1 )] = � ( t1 − �1 �1 ) �2 = P [ X > t2||C ∈ Π2 ] = P [ X|Z=2 > t2 ] = [ 1 − Φ ( t2 − �2 �2 )] = � ( t2 − �2 �2 ) . (17) �Tot ( t1, t2, � ) = P [ X > t1||C ∈ Π1 ] ⋅ P [ C ∈ Π1 ] + P [ X > t2||C ∈ Π2 ] ⋅ P [ C ∈ Π2 ] = P [ X|Z=1 > t1 ] P[Z = 1] + P [ X|Z=2 > t2 ] P[Z = 2] = � ( t1 − �1 �1 ) �1 + � ( t2 − �2 �2 ) �2 (18) �1 ( t1, t2, � ) = P [ X|Z=1 > t1 ] P[Z = 1] P [[ X|Z=1 > t1 ] ∨ [ X|Z=2 > t2 ]] = � ( t1−�1 �û1 ) �1 �Tot ( t1, t2, � ) = | | Γ1|| | | Γ1 ∪ Γ2|| and �2 ( t1, t2, � ) = 1 − �1 ( t1, t2, � ) . (19) E [ X|Z=1 > t1 ] = �1i� ( t1−�1 �1 ) + �1 and E [ X|Z=2 > t2 ] = �2i� ( t2−�2 �2 ) + �2, Since the mean of unselected candidates in Π1 ∪ Π2 is obtained as  we get for the change in the mean of the SC as a result of truncation selection (= selection differential) Assuming the regression coefficient of the TGV on the SC is b1 in Π1 and b2 in Π2 and applying the breeders’ equa- tion for each set, we get for the total selection response in Π1 ∪ Π2 under truncation selection with thresholds t1 and t2: w h e r e ΔG1 (( t1 − �1 ) , �1, b1 ) = b1�1i� ( t1−�1 �1 ) a n d ΔG2 (( t2 − �2 ) , �2, b2 ) = b2�2i� ( t2−�2 �2 ) refer to the selection response realized in set Π1 and Π2 , respectively. Appendix 2: Maximizing selection response by optimal choice of selection thresholds To determine the maximum of the selection response GTot ( t1, t2, �, b1, b2 ) as a function of the thresholds t1 and t2 , we use a Lagrange multiplier approach. We start with some basic properties of the normal distribution N(0, 1) with pdf �(x) and cdf Φ(x) . Let i�(x) = �(x) �(x) and �(x) = ∫ ∞ x �(z)dz = 1 − Φ(x) be the selection intensity and selected propor tion, respectively, then we have (20) E [ X| [ Z = 1 ∧ X| Z=1 > t1 ] ∨ [ Z = 2 ∧ X| Z=2 > t2 ]] = E [ X| Z=1 > t1 ] P [ X| Z=1 > t1 || [ X| Z=1 > t1 ] ∨ [ X| Z=2 > t2 ]] + E [ X| Z=2 > t2 ] P [ X| Z=2 > t2 || [ X| Z=1 > t1 ] ∨ [ X| Z=2 > t2 ]] = [ 𝜎1i𝛼 ( t1−𝜇1 𝜎1 ) + 𝜇1 ] 𝛾1 ( t1, t2, � ) + [ 𝜎2i𝛼 ( t2−𝜇2 𝜎2 ) + 𝜇2 ] 𝛾2 ( t1, t2, � ) E[X|[Z = 1] ∨ [Z = 1]] = E[X|Z = 1]P[Z = 1] + E[X|Z = 2]P[Z = 2] = �1�1 + �2�2 (21) SDTot ( t1, t2, � ) = �1i� ( t1−�1 �1 )�1 ( t1, t2, � ) +�2i� ( t2−�2 �2 )�2 ( t1, t2, � ) + �1 [ �1 ( t1, t2, � ) − �1 ] + �2 [ �2 ( t1, t2, � ) − �2 ] (22) ΔGTot ( t1, t2, �, b1, b2 ) = ΔG1 (( t1 − �1 ) , �1, b1 ) �1 ( t1, t2, � ) + ΔG2 (( t2 − �2 ) , �2, b2 ) �2 ( t1, t2, � ) + �1 [ �1 ( t1, t2, � ) − �1 ] + �2 [ �2 ( t1, t2, � ) − �2 ] Theoretical and Applied Genetics (2024) 137:104104 Page 14 of 18 ��(x) �x = −x�(x) and ��(x) �x = −�(x) . Hence, setting x = t−� � , we obtain. Defining for given values of �, b1, b2 and �T ∈ (0, 1): and we get the Lagrangian function Thus, for given values of �, b1, b2 and � , we obtain a nec- essary condition for the maximum of ΔGTot ( t1, t2, �, b1, b2 ) = 1 �T f ( t1, t2 ) under the side condition g ( t1, t2 ) =0 by analyzing the gradient ∇L ( t1, t2, � ) . We have and Thus, (23) �� ( t−� � ) �t = − ( t − � � ) � ( t − � � ) 1 � . (24) �� ( t−� � ) �t = −� ( t − � � ) 1 � . �1 ( t1 ) = � ( t1 − �1 �1 ) �1, �2 ( t2 ) = � ( t2 − �2 �2 ) �2, g ( t1, t2 ) = �1 ( t1 ) + �2 ( t2 ) − �T f ( t1, t2 ) = ΔG1 (( t1 − �1 ) , �1, b1 ) �1 ( t1 ) + ΔG2 (( t2 − �2 ) , �2, b2 ) �2 ( t2 ) + �1 ( �1 ( t1 ) − �1�T ) + �2 ( �2 ( t2 ) − �2�T ) , (25)L ( t1, t2, � ) = f ( t1, t2 ) + �g(t1, t2) ΔG1 (( t1 − �1 ) , �1, b1 ) �1 ( t1 ) = b1�1 � ( t1−�1 �1 ) � ( t1−�1 �1 ) � ( t1 − �1 �1 ) �1 = b1�1�1� ( t1 − �1 �1 ) ΔG2 (( t2 − �2 ) , �2, b2 ) �2 ( t2 ) = b2�2�2� ( t2 − �2 �2 ) . �(ΔG1 (( t1 − �1 ) , �1, b1 ) �1 ( t1 ) �t1 = −b1�1�1 ( t1 − �1 �1 ) � ( t1 − �1 �1 ) 1 �1 = −b1�1 ( t1 − �1 ) � ( t1 − �1 �1 ) 1 �1 Likewise, and Thus, we get �ΔG 2 (( t 2 − � 2 ) , � 2 , b 2 ) � 2 ( t 2 ) �t 1 = 0, � ( � 1 ( � 1 ( t 1 ) − � 1 � Tot )) �t 1 = −� 1 � 1 � ( t 1 − � 1 � 1 ) 1 � 1 , � ( � 2 ( � 2 ( t 2 ) − � 2 � Tot )) �t 1 = 0, ��g ( t1, t2 ) �t1 = −��1� ( t1 − �1 �1 ) 1 �1 . �(ΔG1 (( t1 − �1 ) , �1, b1 ) �1 ( t1 ) �t2 = 0, �ΔG2 (( t2 − �2 ) , �2, b2 ) �2 ( t2 ) �t2 = −b2�2 ( t2 − �2 ) � ( t2 − �2 �2 ) 1 �2 , � ( �1 ( �1 ( t1 ) − �1�T )) �t2 = 0, � ( �2 ( �2 ( t2 ) − �2�T )) �t2 = −�2�2� ( t2 − �2 �2 ) 1 �2 , ��g ( t1, t2 ) �t2 = −��2� ( t2 − �2 �2 ) 1 �2 . �(ΔG1 (( t1 − �1 ) , �1, b1 ) �1 ( t1 ) �� = 0, �ΔG2 (( t2 − �2 ) , �2, b2 ) �2 ( t2 ) �� = 0, �(�1 ( �1 ( t1 ) − �T ) ) �� = 0, �(�2 ( �1 ( t1 ) − �1�T ) ) �� = 0 ��g ( t1, t2 ) �� = g ( t1, t2 ) (26) �L ( t1, t2, � ) �t1 = −b1�1 ( t1 − �1 ) � ( t1 − �1 �1 ) 1 �1 − �1�1� ( t1 − �1 �1 ) 1 �1 − ��1� ( t1 − �1 �1 ) 1 �1 = − �1 �1 � ( t1 − �1 �1 ) [ ( b1 ( t1 − �1 ) + �1 + � ] (27) �L ( t1, t2, � ) �t2 = − �2 �2 � ( t2 − �2 �2 ) [(b2 ( t2 − �2 ) + �2 + �] Theoretical and Applied Genetics (2024) 137:104 Page 15 of 18 104 Set t ing ∇L ( t1, t2, � ) = (0, 0, 0) and us ing that 𝜋1 𝜎1 𝜑 ( t1−𝜇1 𝜎1 ) > 0 and 𝜋2 𝜎2 𝜑 ( t2−𝜇2 𝜎2 ) > 0 , we obtain from Eqs. 26 and 27 the necessary conditions: - � = b1 ( t1 − �1 ) + �1 and - � = b2 ( t2 − �2 ) + �2 o r e q u i v a l e n t l y b1 ( t1 − �1 ) + �1 = b2 ( t2 − �2 ) + �2 . Thus, the solutions t∗ 1 , t∗ 2 must fulfill the following conditions: and Defining t∗∗ 1 = t∗ 1 − �1 and t∗∗ 2 = t∗ 2 − �1, Eqs. 29 and 30 a r e e q u i v a l e n t t o t∗∗ 1 = b2t ∗∗ 2 b1 andΦ ( t∗∗ 1 �1 ) �1 + Φ ( t∗∗ 2 �2 ) �2 = 1 − �T , which would be obtained if we assume �1 = 0 . The improvement in ΔGTot ( t∗ 1 , t∗ 2 , �, b1, b2 ) relative to ΔGTot ( ti 1 , ti 2 , �, b1, b2 ) , where ti 1 = ti 2 such that �Tot ( ti 1 , ti 2 , � ) = �T or equivalently Φ ( ti 1 −�2 �1 ) �1 + Φ ( ti 2 −�2 �2 ) �2 = 1 − �T, can be expressed as the ratio ΨTot ( �T, �, b1, b2 ) defined in Eq. 10. Appendix 3: Maximizing the total selection response for factorials in hybrid breeding Since the SC is expected to provide unbiased estimates for the GCA effects in each parent population, we have �1 = �2 = 0 . Additionally, we assume that b1 and b2 are the regression coefficients of the regression of GCA effects on the SC in Π1 and Π2 , respectively, and define � =( �1, �2, b1, b2) . We select proportions �1 = |Γ1| |Π1| and �2 = |Γ2| |Π2| from Π1 and Π2 , respectively, by applying corresponding thresholds t1 = �1Φ −1 ( 1 − �1 ) and t2 = �2Φ −1 ( 1 − �2 ) . If the selected lines are mated in the form of a complete factorial design to produce the hybrids tested in the next (28) �L ( t1, t2, � ) �� = g ( t1, t2 ) (29) t1 = b2 ( t2 − �2 ) + �2 − �1 b1 + �1 or equivalently t2 = b1 ( t1 − �1 ) + �1 − �2 b2 + �2 (30)Φ ( t1 − �1 �1 ) �1 + Φ ( t2 − �2 �2 ) �2 = 1 − �T (31) For the special case of BLUPs, where b1 = b2 = 1.0, we have t∗1 = t∗2 . (32) For BLUEs and �1 = �2, we get t∗ 1 − �1 = b2 ( t∗ 2 − �1 ) b1 . step of the breeding program, the proportion of selected hybrids from the total set of possible hybrids is The expected selection response in the hybrids among lines from Π1 and Π2 , which were selected for their GCA in cross-combinations with genotypes from the other popula- tion, can be obtained using Eq. 6 as follows with and Defining for given values of � and �H ∈ (0, 1): and We get the Lagrangian function Thus, for given values of � and �H , we obtain a necessary condition for the maximum of ΔGHyb ( t1, t2,� ) = 1 �H f ( t1, t2 ) under the side condition g ( t1, t2 ) =0 by analyzing the gradi- ent ∇L ( t1, t2, � ) . We have (33)�Hyb ( t1, t2,� ) = � ( t1 �1 ) × � ( t2 �2 ) . (34)ΔGHyb ( t1, t2,� ) = ΔG1 ( t1, �1, b1 ) + ΔG2 ( t2, �2, b2 ) , (35) ΔG1 ( t1, �1, b1 ) = b1�1 � ( t1 �1 ) � ( t1 �1 ) = 1 �Hyb ( t1, t2,� )b1�1� ( t1 �1 ) � ( t2 �2 ) (36)ΔG2 ( t2, �2, b2 ) = 1 �Hyb ( t1, t2,� )b2�2� ( t2 �2 ) � ( t1 �1 ) �1 ( t1 ) = � ( t1 �1 ) , �2 ( t2 ) = � ( t2 �2 ) , g ( t1, t2 ) = �1 ( t1 ) �2 ( t2 ) − �H f ( t1, t2 ) = b1�1� ( t1 �1 ) � ( t2 �2 ) + b2�2� ( t2 �2 ) � ( t1 �1 ) , (37)L ( t1, t2, � ) = f ( t1, t2 ) + �g ( t1, t2 ) (38) �L ( t1, t2� ) �t1 = −b1�1 t1 �1 � ( t1 �1 ) 1 �1 � ( t2 �2 ) + b2�2� ( t2 �2 ) × ( −� ( t1 �1 ) 1 �1 ) + � ( −� ( t1 �1 ) 1 �1 � ( t2 �2 )) = −� ( t1 �1 ) � ( t2 �2 ) �1 [ b1t1 + b2�2i� ( t2 �2 ) + � ] (39)�L ( t1, t2� ) �t2 = −� ( t2 �2 ) � ( t1 �1 ) �2 [ b1�1i� ( t1 �1 ) + b2t2 + � ] Theoretical and Applied Genetics (2024) 137:104104 Page 16 of 18 Thus, from ( ∇L(t1,t2,�) �t1 , ∇L(t1,t2,�) �t2 , ∇L(t1,t2,�) �� ) = (0, 0, 0) , we get the necessary conditions −� = b1t1 + b2�2i� ( t2 �2 ) and −� = b1�1i� ( t1 �1 ) + b2t2 or equivalently and Numerical solutions ( to 1 , to 2 ) for Eqs. 41 and 42 can be obtained using mathematical software such as Mathematica, from which we get �o 1 = � ( to 1 �1 ) and �o 2 = � ( to 2 �1 ) and ΔGHyb ( to 1 , to 2 ,� ) . This maximum can be compared with the selection response ΔGHyb ( te 1 , te 2 ,� ) for te 1 = �1Φ −1 � 1 − √ �H � and te 2 = �2Φ −1 � 1 − √ �H � , i.e., when an equal proportion �e = √ �H of lines is selected for GCA in each parent popu- lation. For selection based on BLUPs, where b1 = b2 = 1 , Eq. 40 simplifies to which corresponds to the difference in the selection differ- entials in Π1 and Π2. Appendix 4: Regression equation of true genetic values (TGVs) on their BLUPs We consider the ordinary mixed linear model studied by Henderson (1975) where y is an n × 1 observation vector, X a known n × p matrix with full column rank p , � an unknown fixed vector, and Z is a known n × q matrix. u and e are unobservable random vectors with null means and where G and R are both nonsingular and known matrices. Let û be the BLUP of u calculated as described by Hen- derson (1975) and denote Σ22 = var(û) and Σ12 = cov(u, û) . (40) �L ( t1, t2� ) �� = � ( t1 �1 ) � ( t2 �2 ) − �H (41) b1t1 − b2t2 = b1�1i� ( t1 �1 ) − b2�2i� ( t2 �2 ) = ΔG1 ( t1, �1, b1 ) − ΔG1 ( t1, �1, b1 ) (42)� ( t1 �1 ) � ( t2 �2 ) = �H (43)t1 − t2 = �1i� ( t1 �1 ) − �2i� ( t2 �2 ) y = X� + Zu + e, var [ u e ] = [ G 0 0 R ] , Then, according to Henderson (1975), Σ22 = Σ12 so that we have Σ12Σ −1 22 = I , if �22 can be inverted, In the case, where [ u e ] follows a multivariate normal distribution so that û also follows a normal distribution, the regression function of u on û is linear (Anderson 1958, p. 29) and can be expressed as �12� −1 22 û = Iû. Supplementary Information The online version contains supplemen- tary material available at https:// doi. org/ 10. 1007/ s00122- 024- 04592-2. Acknowledgements The authors are indebted to Prof. Daniel Gianola for critical reading and helpful suggestions on an earlier version of the manuscript. ChatGPT 3.5 by OpenAI has been used to improve the grammar and style of this paper. Authors contribution statement AEM conceived the study and devel- oped the theory. AEM and AJM developed jointly the Mathematica programs and the figures. AEM wrote the manuscript with support from RF and CCS. All authors discussed and interpreted the results, read and approved the final manuscript. Funding Open Access funding enabled and organized by Projekt DEAL. This work was funded by intra-mural funds of the Techni- cal University of Munich. Open access was enabled and organized by Projekt Deal. Declarations Conflict of interest The authors declare that they have no conflict of interest. AEM is editor-in-chief and CCS is member of the editorial board of Theor. Appl. Genetics. Ethical standard The authors declare that their work complies with the current laws of Germany. Open Access This article is licensed under a Creative Commons Attri- bution 4.0 International License, which permits use, sharing, adapta- tion, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. References Anderson TW (1958) An introduction to multivariate statistical analy- sis. Wiley, New York Auinger H-J, Lehermeier C, Gianola D, Mayer M, Melchinger AE, da Silva S, Knaak C, Ouzunova M, Schön C-C (2021) Calibration and validation of predicted genomic breeding values in an advanced cycle maize population. Theor Appl Genet 134:3069–3081 Barbosa PAM, Fritsche-Neto R, Andrade MC, Petroli CD, Burgueño J, Galli G, Willcox MC, Sonder K, Vidal-Martínez VA, Sifuentes- Ibarra E (2021) Introgression of maize diversity for drought https://doi.org/10.1007/s00122-024-04592-2 http://creativecommons.org/licenses/by/4.0/ Theoretical and Applied Genetics (2024) 137:104 Page 17 of 18 104 tolerance: subtropical maize landraces as source of new positive variants. Front Plant Sci 12:691211 Bernardo R (1994) Prediction of maize single-cross performance using RFLPs and information from related hybrids. Crop Sci 34:20–25 Bernardo R (1996) Best linear unbiased prediction of maize single- cross performance. Crop Sci 36:50–56 Bernardo R (2002) Breeding for quantitative traits in plants. Stemma Press, Woodbury Böhm J, Schipprack W, Utz HF, Melchinger AE (2017) Tapping the genetic diversity of landraces in allogamous crops with doubled haploid lines: a case study from European flint maize. Theor Appl Genet 130:861–873 Bonnett D, Li Y, Crossa J, Dreisigacker S, Basnet B, Pérez-Rodríguez P, Alvarado G, Jannink J-L, Poland J, Sorrells M (2022) Response to early generation genomic selection for yield in wheat. Front Plant Sci 12:718611 Brauner PC, Müller D, Molenaar WS, Melchinger AE (2019) Genomic prediction with multiple biparental families. Theor Appl Genet 133:133–147 Brotherstone S, Hill W (1986) Heterogeneity of variance amongst herds for milk production. Anim Sci 42:297–303 Bulmer MG (1980) The mathematical theory of quantitative genetics. Clarendon Press, New York Chaikam V, Molenaar W, Melchinger AE, Boddupalli PM (2019) Dou- bled haploid technology for line development in maize: technical advances and prospects. Theor Appl Genet 132:3227–3243 Clark SA, Hickey JM, Daetwyler HD, van der Werf JH (2012) The importance of information on relatives for the prediction of genomic breeding values and the implications for the makeup of reference data sets in livestock breeding schemes. Genet Sel Evol 44:1–9 Cochran W (1951) Improvement by means of selection. In: Proceedings of the second Berkeley symposium on mathematical statistics and probability, pp 449–470 Daetwyler HD, Villanueva B, Bijma P, Woolliams JA (2007) Inbreed- ing in genome-wide selection. J Anim Breed Genet 124:369–376 Falconer D, Mackay T (1996) Introduction to quantitative genetics. Longman Group, Essex Fernando R, Gianola D (1986) Optimal properties of the conditional mean as a selection criterion. Theor Appl Genet 72:822–825 Garrick D, Van Vleck LD (1987) Aspects of selection for performance in several environments with heterogeneous variances. J Anim Sci 65:409–421 Gaynor RC, Gorjanc G, Hickey JM (2021) AlphaSimR: an R package for breeding program simulations. G3 11:jkaa017 Goffinet B (1983) Selection on selected records. Génét Sélect Évol 15:91–98 Habier D, Fernando RL, Dekkers J (2007) The impact of genetic rela- tionship information on genome-assisted breeding values. Genet- ics 177:2389–2397 Hartl DL, Clark AG, Clark AG (1997) Principles of population genet- ics. Sinauer Associates, Sunderland Henderson CR (1975) Best linear unbiased estimation and prediction under a selection model. Biometrics 31:423–447 Henderson C (1990) Statistical methods in animal improvement: his- torical overview. In: Advances in statistical methods for genetic improvement of livestock. Springer, pp 2–14 Hill W (1984) On selection among groups with heterogeneous vari- ance. Anim Sci 39:473–477 Hölker AC, Mayer M, Presterl T, Bolduan T, Bauer E, Ordas B, Brauner PC, Ouzunova M, Melchinger AE, Schön C-C (2019) European maize landraces made accessible for plant breeding and genome-based studies. Theor Appl Genet 132:3333–3345 Kennedy B, Sorenson D (1988) Properties of mixed model methods for prediction of genetic merit under different genetic models in selected and nonselected populations. In: Second international conference on quantitative genetics, Raleigh. Sinauer Associates, pp 47–56 Lehermeier C, Krämer N, Bauer E, Bauland C, Camisan C, Campo L, Flament P, Melchinger AE, Menz M, Meyer N (2014) Usefulness of multiparental populations of maize (Zea mays L.) for genome- based prediction. Genetics 198:3–16 Lian L, Jacobson A, Zhong S, Bernardo R (2014) Genomewide predic- tion accuracy within 969 maize biparental populations. Crop Sci 54:1514–1522 Lynch M, Walsh B (1998) Genetics and analysis of quantitative traits. Sinauer, Sunderland Mayer M, Unterseer S, Bauer E, de Leon N, Ordas B, Schön CC (2017) Is there an optimum level of diversity in utilization of genetic resources? Theor Appl Genet 130:2283–2295 Melchinger AE, Fernando R, Stricker C, Schön CC, Auinger HJ (2023) Genomic prediction in hybrid breeding: I. Optimizing the training set design. Theor Appl Genet 136:176 Melchinger AE, Frisch M (2023) Genomic prediction in hybrid breed- ing: II. Reciprocal recurrent genomic selection with full-sib and half-sib families. Theor Appl Genet 136:203 Melchinger AE, Posselt UK (2013) Biotechnologie und Züchtung. In: Lütke-Entrup NS, Schwarz FJ, Heilmann H (eds) Handbuch Mais. DLG Verlag, Frankfurt, M, pp 53–64 Piepho H, Möhring J, Melchinger A, Büchse A (2008) BLUP for phe- notypic selection in plant breeding and variety testing. Euphytica 161:209–228 Rasheed A, Hao Y, Xia X, Khan A, Xu Y, Varshney RK, He Z (2017) Crop breeding chips and genotyping platforms: progress, chal- lenges, and perspectives. Mol Plant 10:1047–1064 Riedelsheimer C, Melchinger AE (2013) Optimizing the allocation of resources for genomic selection in one breeding cycle. Theor Appl Genet 126:2835–2848 Riedelsheimer C, Endelman JB, Stange M, Sorrells ME, Jannink J-L, Melchinger AE (2013) Genomic predictability of interconnected Bi-parental maize populations. Genetics 194:493–503 Robert P, Auzanneau J, Goudemand E, Oury F-X, Rolland B, Heumez E, Bouchet S, Le Gouis J, Rincent R (2022) Phenomic selection in wheat breeding: identification and optimisation of factors influ- encing prediction accuracy and comparison to genomic selection. Theor Appl Genet 135:895–914 Schnell F (1982) A synoptic study of the methods and categories of plant breeding Schrag T, Melchinger A, Sørensen A, Frisch M (2006) Prediction of single-cross hybrid performance for grain yield and grain dry mat- ter content in maize using AFLP markers associated with QTL. Theor Appl Genet 113:1037–1047 Schrag TA, Westhues M, Schipprack W, Seifert F, Thiemann A, Scholten S, Melchinger AE (2018) Beyond genomic prediction: combining different types of omics data can improve prediction of hybrid performance in maize. Genetics 208:1373–1385 Seifert F, Thiemann A, Schrag TA, Rybka D, Melchinger AE, Frisch M, Scholten S (2018) Small RNA-based prediction of hybrid per- formance in maize. BMC Genom 19:1–14 Seye A, Bauland C, Charcosset A, Moreau L (2020) Revisiting hybrid breeding designs using genomic predictions: simulations high- light the superiority of incomplete factorials between segregating families over topcross designs. Theor Appl Genet 133:1995–2010 Sorenson D, Gianola D (2004) Likelihood, bayesian, and MCMC meth- ods in quantitative genetics. Springer, New York Technow F, Schrag TA, Schipprack W, Bauer E, Simianer H, Melch- inger AE (2014) Genome properties and prospects of genomic prediction of hybrid performance in a breeding program of maize. Genetics 197:1343–1355 Theoretical and Applied Genetics (2024) 137:104104 Page 18 of 18 Watson A, Ghosh S, Williams MJ, Cuddy WS, Simmonds J, Rey M-D, Asyraf Md, Hatta M, Hinchliffe A, Steed A, Reynolds D (2018) Speed breeding is a powerful tool to accelerate crop research and breeding. Nat Plants 4:23–29 Weiß TM, Zhu X, Leiser WL, Li D, Liu W, Schipprack W, Melchinger AE, Hahn V, Würschum T (2022) Unraveling the potential of phenomic selection within and among diverse breeding material of maize (Zea mays L.). G3 12:jkab445 Westhues M, Schrag TA, Heuer C, Thaller G, Utz HF, Schipprack W, Thiemann A, Seifert F, Ehret A, Schlereth A (2017) Omics-based hybrid prediction in maize. Theor Appl Genet 130:1927–1939 Westhues M, Heuer C, Thaller G, Fernando R, Melchinger AE (2019) Efficient genetic value prediction using incomplete omics data. Theor Appl Genet 132:1211–1222 Wilde P, Menzel J, Schmiedchen B (2003) Estimation of general and specific combining ability variances and their implications on hybrid rye breeding. Plant Breed Seed Sci 47:89–98 Wilde P, Miedaner T (2021) Hybrid rye breeding. In: The rye genome, pp 13–41 Wolfram S (1999) The MATHEMATICA® book, version 4. Cam- bridge University Press, Cambridge Woolliams J, Berg P, Dagnachew B, Meuwissen T (2015) Genetic con- tributions and their optimization. J Anim Breed Genet 132:89–99 Zenke-Philippi C, Frisch M, Thiemann A, Seifert F, Schrag T, Melch- inger AE, Scholten S, Herzog E (2017) Transcriptome-based pre- diction of hybrid performance with unbalanced data from a maize breeding programme. Plant Breed 136:331–337 Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Optimizing selection based on BLUPs or BLUEs in multiple sets of genotypes differing in their population parameters Abstract Key message Abstract Introduction Theory Maximizing the total selection response by optimal choice of thresholds Application to selection based on BLUPs Application to selection based on BLUEs Numerical analyses Software availability statement Results Discussion Examples of sets differing in population parameters Contrasting BLUEs and BLUPs as selection criteria Properties of BLUPs for selection Composition of the selected fraction Optimal selection of parent lines in hybrid breeding Conclusions Appendix 1: Response to truncation selection in two sets and composition of the selected set Appendix 2: Maximizing selection response by optimal choice of selection thresholds Appendix 3: Maximizing the total selection response for factorials in hybrid breeding Appendix 4: Regression equation of true genetic values (TGVs) on their BLUPs Acknowledgements References