Abstract
Genomic selection (GS) is a method for predicting breeding values of plants or animals using many molecular markers that is commonly implemented in two stages. In plant breeding the first stage usually involves computation of adjusted means for genotypes which are then used to predict genomic breeding values in the second stage. We compared two classical stage-wise approaches, which either ignore or approximate correlations among the means by a diagonal matrix, and a new method, to a single-stage analysis for GS using ridge regression best linear unbiased prediction (RR-BLUP). The new stage-wise method rotates (orthogonalizes) the adjusted means from the first stage before submitting them to the second stage. This makes the errors approximately independently and identically normally distributed, which is a prerequisite for many procedures that are potentially useful for GS such as machine learning methods (e.g. boosting) and regularized regression methods (e.g. lasso). This is illustrated in this paper using componentwise boosting. The componentwise boosting method minimizes squared error loss using least squares and iteratively and automatically selects markers that are most predictive of genomic breeding values. Results are compared with those of RR-BLUP using fivefold cross-validation. The new stage-wise approach with rotated means was slightly more similar to the single-stage analysis than the classical two-stage approaches based on non-rotated means for two unbalanced datasets. This suggests that rotation is a worthwhile pre-processing step in GS for the two-stage approaches for unbalanced datasets. Moreover, the predictive accuracy of stage-wise RR-BLUP was higher (5.0–6.1 %) than that of componentwise boosting.
Similar content being viewed by others
Abbreviations
- BLUP:
-
Best linear unbiased prediction
- GEBV:
-
Genomic estimated breeding value
- GS:
-
Genomic selection
- RCBD:
-
Randomized complete block design
- REML:
-
Restricted maximum likelihood
- RR-BLUP:
-
Ridge regression BLUP
- SNP:
-
Single nucleotide polymorphism
References
Albrecht T, Wimmer V, Auinger HJ, Erbe M, Knaak C, Ouzunova M, Simianer H, Schön CC (2011) Genome-based prediction of testcross values in maize. Theor Appl Genet 123:339–350
Arlot S, Celisse A (2010) A survey of cross-validation procedures for model selection. Stat Surv 4:40–79
Berk RA (2008) Statistical learning from a regression perspective. Springer, New York
Boulesteix AL, Hothorn T (2010) Testing the additional predictive value of high-dimensional molecular data. BMC Bioinforma 11:78
Bühlmann P, Hothorn T (2007) Boosting algorithms: regularization, prediction and model fitting. Stat Sci 22:477–505
Buja A, Mease D, Wyner AJ (2007) Comment: boosting algorithms: regularization, prediction and model fitting. Stat Sci 22:506–512
Calus MPL, Veerkamp RF (2007) Accuracy of breeding values when using and ignoring the polygenic effect in genomic breeding value estimation with a marker density of one SNP per cM. J Anim Breed Genet 124:362–368
Cullis BR, Thomson FM, Fisher JA, Gilmour AR, Thompson R (1996) The analysis of the NSW wheat variety database. 1. Modelling trial error variance. Theor Appl Genet 91:21–27
Cullis BR, Gogel BJ, Verbyla AP, Thompson R (1998) Spatial analysis of multi-environment early generation trials. Biometrics 54:1–18
Freund Y, Schapire R (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55:119–139
Friedman JH, Hastie T, Tibshirani R (2000) Additive logistic regression: a statistical view of boosting (with discussion). Ann Stat 38:367–378
Hastie TJ, Tibshirani R, Friedman J (2009) The elements of statistical learning, 2nd edn. Springer, New York
Hayes BJ, Bowman PJ, Chamberlain AJ, Goddard ME (2009) Invited review: genomic selection in dairy cattle: progress and challenges. J Dairy Sci 92:433–443
Henderson CR (1977) Best linear unbiased prediction of breeding values not in the model for records. J Dairy Sci 60:783–787
Heslot N, Yang HP, Sorrels ME, Jannink JL (2012) Genomic selection in plant breeding: a comparison of models. Crop Sci 52:146–160
Hothorn T, Buehlmann P, Kneib T, Schmid M, Hofner, B (2010) mboost: model-based boosting. R package version 2.0-6. http://cran.r-project.org/web/packages/mboost/
John JA, Williams ER (1995) Cyclic and computer generated designs, 2nd edn. Chapman and Hall, London
Long N, Gianola D, Rosa GJM, Weigel KA, Avendano S (2007) Machine learning classification procedure for selecting SNPs in genomic selection: application to early mortality in broilers. J Anim Breed Genet 124:377–389
Macciotta NPP, Gaspa G, Steri R, Pieramati C, Carnier P, Dimauro C (2009) Pre selection of most significant SNPS for the estimation of genomic breeding values. BMC Proc 3(Suppl 1):S14
Mathew T, Nordström K (2010) Comparison of one-step and two-step meta-analysis models using individual patient data. Biom J 52:271–287
Meuwissen THE, Hayes BJ, Goddard ME (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157:1819–1829
Möhring J, Piepho HP (2009) Comparison of weighting in two-stage analyses of series of experiments. Crop Sci 49:1977–1988
Ogutu JO, Piepho HP, Schulz-Streeck T (2011) A comparison of random forests, boosting and support vector machines for genomic selection using SNP markers. BMC Proc 5(Suppl 3):S11
Piepho HP (2009) Ridge regression and extensions for genome-wide selection in maize. Crop Sci 49:1165–1176
Piepho HP, Möhring J (2006) Selection in cultivar trials—is it ignorable? Crop Sci 146:193–202
Piepho HP, Williams ER, Fleck M (2006) A note on the analysis of designed experiments with complex treatment structure. Hortic Sci 41:446–452
Piepho HP, Schulz-Streeck T, Ogutu JO (2011) A stage-wise approach for analysis of multi-environment trials. Biuletyn Oceny Odmian 33:7–20
Piepho HP, Möhring J, Schulz-Streeck T, Ogutu JO (2012a) A stage-wise approach for analysis of multi-environment trials. Biom J (in press)
Piepho HP, Ogutu JO, Schulz-Streeck T, Estaghvirou B, Gordillo A, Technow F (2012b) Efficient computation of ridge-regression BLUP in genomic selection in plant breeding. Crop Sci 52:1093–1104
Qiao CG, Basford KE, DeLacy IH, Cooper M (2000) Evaluation of experimental designs and spatial analysis in wheat breeding trials. Theor Appl Genet 100:9–16
Rao CR, Toutenburg H, Shalabh, Heumann C (2008) Linear models and generalizations least squares and alternatives. Springer, Berlin
Ruppert D, Wand MP, Carroll RJ (2003) Semiparametric regression. Cambridge University Press, Cambridge
Schulz-Streeck T, Piepho HP (2010) Genome-wide selection by mixed model ridge regression and extensions based on geostatistical models. BMC Proc 4(Suppl 1):S8
Schulz-Streeck T, Ogutu JO, Piepho HP (2011) Pre-selection of markers for genomic selection. BMC Proc 5(Suppl 3):S12
Schulz-Streeck T, Estaghvirou B, Technow F (2012) rrBlupMethod6: re-parametrization of RR-BLUP to allow for a fixed residual variance. R package, version 1.2. http://cran.r-project.org/web/packages/rrBlupMethod6/index.html
Searle SR, Casella G, McCulloch CE (1992) Variance components. Wiley, New York
Smith AB, Cullis BR, Gilmour AR (2001a) The analysis of crop variety evaluation data in Australia. Aust N Z J Stat 43:129–145
Smith A, Cullis B, Thompson R (2001b) Analyzing variety by environment data using multiplicative mixed models and adjustments for spatial field trend. Biometrics 57:1138–1147
Tutz G, Reithinger F (2007) A boosting approach to flexible semiparametric mixed models. Stat Med 26:2872–2900
Van Houwelingen HC, Arends LR, Stijnen T (2002) Advanced methods in meta-analysis: multivariate approach and meta-regression. Stat Med 21:589–624
Welham S, Gogel BJ, Smith AB, Thompson R, Cullis BR (2010) A comparison of analysis methods for late-stage evaluation trials. Aust N Z J Stat 52:125–149
Acknowledgments
We thank AgReliant Genetics for providing the datasets. This research was funded by AgReliant Genetics and the German Federal Ministry of Education and Research (BMBF) within the AgroClustEr “Synbreed—Synergistic plant and animal breeding” (Grant ID: 0315526). Three anonymous referees are thanked for very useful and constructive comments.
Conflict of interest
The authors declare that they have no competing interests.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by J. Yu.
Rights and permissions
About this article
Cite this article
Schulz-Streeck, T., Ogutu, J.O. & Piepho, HP. Comparisons of single-stage and two-stage approaches to genomic selection. Theor Appl Genet 126, 69–82 (2013). https://doi.org/10.1007/s00122-012-1960-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00122-012-1960-1