Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Brief Communication
  • Published:

Predicting the molecular complexity of sequencing libraries

Abstract

Predicting the molecular complexity of a genomic sequencing library is a critical but difficult problem in modern sequencing applications. Methods to determine how deeply to sequence to achieve complete coverage or to predict the benefits of additional sequencing are lacking. We introduce an empirical Bayesian method to accurately characterize the molecular complexity of a DNA sample for almost any sequencing application on the basis of limited preliminary sequencing.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Difficulties in predicting library complexity from initial shallow sequencing.
Figure 2: Library complexity can be estimated in terms of distinct molecules sequenced or distinct loci identified.

Similar content being viewed by others

References

  1. Lander, E. & Waterman, M. Genomics 2, 231–239 (1988).

    Article  CAS  Google Scholar 

  2. Chen, Y. et al. Nat. Methods 9, 609–614 (2012).

    Article  CAS  Google Scholar 

  3. Fisher, R.A., Corbet, S. & Williams, C.B. J. Anim. Ecol. 12, 42–58 (1943).

    Article  Google Scholar 

  4. Good, I.J. & Toulmin, G.H. Biometrika 43, 45–63 (1956).

    Article  Google Scholar 

  5. Kivioja, T. et al. Nat. Methods 9, 72–74 (2012).

    Article  CAS  Google Scholar 

  6. Efron, B. & Thisted, R. Biometrika 63, 435–447 (1976).

    Google Scholar 

  7. Baker, G. & Graves-Morris, P. Pade Approximants (Cambrige University Press, Cambridge, UK, 1996).

  8. Molaro, A. et al. Cell 146, 1029–1041 (2011).

    Article  CAS  Google Scholar 

  9. Ribeiro de Almeida, C. et al. Immunity 35, 501–513 (2011).

    Article  CAS  Google Scholar 

  10. Lister, R. et al. Nature 471, 68–73 (2011).

    Article  CAS  Google Scholar 

  11. Link, W. Biometrics 59, 1123–1130 (2003).

    Article  Google Scholar 

  12. Mao, C. & Lindsay, B. Ann. Stat. 35, 917–930 (2007).

    Article  Google Scholar 

  13. Keating, K., Quinn, J., Ivie, M. & Ivie, L. Ecol. Appl. 8, 1239–1249 (1998).

    Google Scholar 

  14. Hardy, G. Divergent series (Oxford University Press, London, 1949).

  15. Simon, B. Adv. Math. 137, 82–203 (1998).

    Article  Google Scholar 

  16. McCabe, J.H. Math. Comput. 41, 183–197 (1983).

    Google Scholar 

  17. Blanch, G. SIAM Rev. 6, 383–421 (1964).

    Article  Google Scholar 

Download references

Acknowledgements

We thank S. Tavaré, M. Waterman, P. Calabrese, G. Hannon, and members of the Hannon lab and the Smith lab for their help, advice and input. This work was supported by US National Institutes of Health National Human Genome Research Institute grants (R01 HG005238 and P50 HG002790).

Author information

Authors and Affiliations

Authors

Contributions

T.D. and A.D.S. designed the method, implemented the software, performed the analysis and wrote the manuscript.

Corresponding author

Correspondence to Andrew D Smith.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Note, Supplementary Figures 1–2, Supplementary Tables 2–3 (PDF 4640 kb)

Supplementary Table 1

Properties of data sets used in evaluating estimates of library complexity. (XLSX 50 kb)

Supplementary Software

Preseq source code and manual. (ZIP 165 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Daley, T., Smith, A. Predicting the molecular complexity of sequencing libraries. Nat Methods 10, 325–327 (2013). https://doi.org/10.1038/nmeth.2375

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nmeth.2375

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing