Skip to main content
Log in

Comparisons of single-stage and two-stage approaches to genomic selection

  • Original Paper
  • Published:
Theoretical and Applied Genetics Aims and scope Submit manuscript

Abstract

Genomic selection (GS) is a method for predicting breeding values of plants or animals using many molecular markers that is commonly implemented in two stages. In plant breeding the first stage usually involves computation of adjusted means for genotypes which are then used to predict genomic breeding values in the second stage. We compared two classical stage-wise approaches, which either ignore or approximate correlations among the means by a diagonal matrix, and a new method, to a single-stage analysis for GS using ridge regression best linear unbiased prediction (RR-BLUP). The new stage-wise method rotates (orthogonalizes) the adjusted means from the first stage before submitting them to the second stage. This makes the errors approximately independently and identically normally distributed, which is a prerequisite for many procedures that are potentially useful for GS such as machine learning methods (e.g. boosting) and regularized regression methods (e.g. lasso). This is illustrated in this paper using componentwise boosting. The componentwise boosting method minimizes squared error loss using least squares and iteratively and automatically selects markers that are most predictive of genomic breeding values. Results are compared with those of RR-BLUP using fivefold cross-validation. The new stage-wise approach with rotated means was slightly more similar to the single-stage analysis than the classical two-stage approaches based on non-rotated means for two unbalanced datasets. This suggests that rotation is a worthwhile pre-processing step in GS for the two-stage approaches for unbalanced datasets. Moreover, the predictive accuracy of stage-wise RR-BLUP was higher (5.0–6.1 %) than that of componentwise boosting.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Abbreviations

BLUP:

Best linear unbiased prediction

GEBV:

Genomic estimated breeding value

GS:

Genomic selection

RCBD:

Randomized complete block design

REML:

Restricted maximum likelihood

RR-BLUP:

Ridge regression BLUP

SNP:

Single nucleotide polymorphism

References

  • Albrecht T, Wimmer V, Auinger HJ, Erbe M, Knaak C, Ouzunova M, Simianer H, Schön CC (2011) Genome-based prediction of testcross values in maize. Theor Appl Genet 123:339–350

    Article  PubMed  Google Scholar 

  • Arlot S, Celisse A (2010) A survey of cross-validation procedures for model selection. Stat Surv 4:40–79

    Article  Google Scholar 

  • Berk RA (2008) Statistical learning from a regression perspective. Springer, New York

    Google Scholar 

  • Boulesteix AL, Hothorn T (2010) Testing the additional predictive value of high-dimensional molecular data. BMC Bioinforma 11:78

    Article  Google Scholar 

  • Bühlmann P, Hothorn T (2007) Boosting algorithms: regularization, prediction and model fitting. Stat Sci 22:477–505

    Article  Google Scholar 

  • Buja A, Mease D, Wyner AJ (2007) Comment: boosting algorithms: regularization, prediction and model fitting. Stat Sci 22:506–512

    Article  Google Scholar 

  • Calus MPL, Veerkamp RF (2007) Accuracy of breeding values when using and ignoring the polygenic effect in genomic breeding value estimation with a marker density of one SNP per cM. J Anim Breed Genet 124:362–368

    Article  PubMed  CAS  Google Scholar 

  • Cullis BR, Thomson FM, Fisher JA, Gilmour AR, Thompson R (1996) The analysis of the NSW wheat variety database. 1. Modelling trial error variance. Theor Appl Genet 91:21–27

    Article  Google Scholar 

  • Cullis BR, Gogel BJ, Verbyla AP, Thompson R (1998) Spatial analysis of multi-environment early generation trials. Biometrics 54:1–18

    Article  Google Scholar 

  • Freund Y, Schapire R (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55:119–139

    Article  Google Scholar 

  • Friedman JH, Hastie T, Tibshirani R (2000) Additive logistic regression: a statistical view of boosting (with discussion). Ann Stat 38:367–378

    Google Scholar 

  • Hastie TJ, Tibshirani R, Friedman J (2009) The elements of statistical learning, 2nd edn. Springer, New York

    Book  Google Scholar 

  • Hayes BJ, Bowman PJ, Chamberlain AJ, Goddard ME (2009) Invited review: genomic selection in dairy cattle: progress and challenges. J Dairy Sci 92:433–443

    Article  PubMed  CAS  Google Scholar 

  • Henderson CR (1977) Best linear unbiased prediction of breeding values not in the model for records. J Dairy Sci 60:783–787

    Article  Google Scholar 

  • Heslot N, Yang HP, Sorrels ME, Jannink JL (2012) Genomic selection in plant breeding: a comparison of models. Crop Sci 52:146–160

    Google Scholar 

  • Hothorn T, Buehlmann P, Kneib T, Schmid M, Hofner, B (2010) mboost: model-based boosting. R package version 2.0-6. http://cran.r-project.org/web/packages/mboost/

  • John JA, Williams ER (1995) Cyclic and computer generated designs, 2nd edn. Chapman and Hall, London

    Google Scholar 

  • Long N, Gianola D, Rosa GJM, Weigel KA, Avendano S (2007) Machine learning classification procedure for selecting SNPs in genomic selection: application to early mortality in broilers. J Anim Breed Genet 124:377–389

    Article  PubMed  CAS  Google Scholar 

  • Macciotta NPP, Gaspa G, Steri R, Pieramati C, Carnier P, Dimauro C (2009) Pre selection of most significant SNPS for the estimation of genomic breeding values. BMC Proc 3(Suppl 1):S14

    Article  PubMed  Google Scholar 

  • Mathew T, Nordström K (2010) Comparison of one-step and two-step meta-analysis models using individual patient data. Biom J 52:271–287

    PubMed  Google Scholar 

  • Meuwissen THE, Hayes BJ, Goddard ME (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157:1819–1829

    PubMed  CAS  Google Scholar 

  • Möhring J, Piepho HP (2009) Comparison of weighting in two-stage analyses of series of experiments. Crop Sci 49:1977–1988

    Article  Google Scholar 

  • Ogutu JO, Piepho HP, Schulz-Streeck T (2011) A comparison of random forests, boosting and support vector machines for genomic selection using SNP markers. BMC Proc 5(Suppl 3):S11

    Article  PubMed  Google Scholar 

  • Piepho HP (2009) Ridge regression and extensions for genome-wide selection in maize. Crop Sci 49:1165–1176

    Article  Google Scholar 

  • Piepho HP, Möhring J (2006) Selection in cultivar trials—is it ignorable? Crop Sci 146:193–202

    Google Scholar 

  • Piepho HP, Williams ER, Fleck M (2006) A note on the analysis of designed experiments with complex treatment structure. Hortic Sci 41:446–452

    Google Scholar 

  • Piepho HP, Schulz-Streeck T, Ogutu JO (2011) A stage-wise approach for analysis of multi-environment trials. Biuletyn Oceny Odmian 33:7–20

    Google Scholar 

  • Piepho HP, Möhring J, Schulz-Streeck T, Ogutu JO (2012a) A stage-wise approach for analysis of multi-environment trials. Biom J (in press)

  • Piepho HP, Ogutu JO, Schulz-Streeck T, Estaghvirou B, Gordillo A, Technow F (2012b) Efficient computation of ridge-regression BLUP in genomic selection in plant breeding. Crop Sci 52:1093–1104

    Article  Google Scholar 

  • Qiao CG, Basford KE, DeLacy IH, Cooper M (2000) Evaluation of experimental designs and spatial analysis in wheat breeding trials. Theor Appl Genet 100:9–16

    Article  Google Scholar 

  • Rao CR, Toutenburg H, Shalabh, Heumann C (2008) Linear models and generalizations least squares and alternatives. Springer, Berlin

  • Ruppert D, Wand MP, Carroll RJ (2003) Semiparametric regression. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Schulz-Streeck T, Piepho HP (2010) Genome-wide selection by mixed model ridge regression and extensions based on geostatistical models. BMC Proc 4(Suppl 1):S8

    Article  PubMed  Google Scholar 

  • Schulz-Streeck T, Ogutu JO, Piepho HP (2011) Pre-selection of markers for genomic selection. BMC Proc 5(Suppl 3):S12

    Article  PubMed  Google Scholar 

  • Schulz-Streeck T, Estaghvirou B, Technow F (2012) rrBlupMethod6: re-parametrization of RR-BLUP to allow for a fixed residual variance. R package, version 1.2. http://cran.r-project.org/web/packages/rrBlupMethod6/index.html

  • Searle SR, Casella G, McCulloch CE (1992) Variance components. Wiley, New York

    Book  Google Scholar 

  • Smith AB, Cullis BR, Gilmour AR (2001a) The analysis of crop variety evaluation data in Australia. Aust N Z J Stat 43:129–145

    Article  Google Scholar 

  • Smith A, Cullis B, Thompson R (2001b) Analyzing variety by environment data using multiplicative mixed models and adjustments for spatial field trend. Biometrics 57:1138–1147

    Article  PubMed  CAS  Google Scholar 

  • Tutz G, Reithinger F (2007) A boosting approach to flexible semiparametric mixed models. Stat Med 26:2872–2900

    Article  PubMed  CAS  Google Scholar 

  • Van Houwelingen HC, Arends LR, Stijnen T (2002) Advanced methods in meta-analysis: multivariate approach and meta-regression. Stat Med 21:589–624

    Article  PubMed  Google Scholar 

  • Welham S, Gogel BJ, Smith AB, Thompson R, Cullis BR (2010) A comparison of analysis methods for late-stage evaluation trials. Aust N Z J Stat 52:125–149

    Article  Google Scholar 

Download references

Acknowledgments

We thank AgReliant Genetics for providing the datasets. This research was funded by AgReliant Genetics and the German Federal Ministry of Education and Research (BMBF) within the AgroClustEr “Synbreed—Synergistic plant and animal breeding” (Grant ID: 0315526). Three anonymous referees are thanked for very useful and constructive comments.

Conflict of interest

The authors declare that they have no competing interests.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hans-Peter Piepho.

Additional information

Communicated by J. Yu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Schulz-Streeck, T., Ogutu, J.O. & Piepho, HP. Comparisons of single-stage and two-stage approaches to genomic selection. Theor Appl Genet 126, 69–82 (2013). https://doi.org/10.1007/s00122-012-1960-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00122-012-1960-1

Keywords

Navigation